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出版者的话 


在我国已经加入 WTO 、 经济全球化的今天，为适应当前我国高校各类创 
新人才培养的需要，大力推进教育部倡导的双语教学，配合教育部实施的“高 
等学校教学质量与教学改革工程”和“精品课程”建设的需要，高等教育出版 
社有计划、大规模地开展了海外优秀数学类系列教材的引进工作。 

高等教育出版社和 Pearson Education , John Wiley & Sons , McGraw - Hill , 
Thomson Learning 等国外出版公司进行了广泛接触，经国外出版公司的推荐并 
在国内专家的协助下，提交引进版杈总数100余种。收到样书后，我们聘请了 
国内高校一线教师、专家、学者参与这些原版教材的评介工作，并参考国内相 
关专业的课程设置和教学实际情况，从中遴选出了这套优秀教材组织出版。 

这批教材普遍具有以下特点： （1) 基本上是近 3 年出版的，在国际上被广 
泛使用，在同类教材中具有相当的杈 威性； （2) 高版次，历经多年教学实践检 
验，内容翔实准确、反映时代 要求； （3) 各种教学资源配套整齐，为师生提供 
了极大的 便利； （4) 插图精美、丰富，图文并茂，与正文相辅 相成； （5) 语言 
简练、流畅、可读性强，比较适合非英语国家的学生阅读。 

本系列丛书中，有 Finney 、 Weir 等编的《托马斯微积分》（第10版, 
Pearson ), 其特色可用“呈传统特色、富革新精神”概括，本书自20世纪50 
年代第1版以来，平均每四五年就有一个新版面世，长达50余年始终盛行于 
西方教坛，作者既有相当高的学术水平，又热爱教学，长期工作在教学第一 
线，其中，年近90的 G . B . Thomas 教授长年在 MIT 工作，具有丰富的教学经 
验； Finney 教授也在 MIT 工作达10年； Weir 是美国数学建模竞赛委员会主 
任。 Stewart 编的立体化教材《微积分》（第5版 ， Thomson Learning ) 配备了 
丰富的教学资源，是国际上最畅销的微积分原版教材，2003年全球销量约40 
余万册，在美国，占据了约50%~60%的微积分教材市场，其用户包括耶鲁 
等名牌院校及众多一般院校600余所。本系列丛书还包括 Anton 编的经典教材 
(线 性代数及其应用》（第8版， Wiley ); JayL . Devore 编的优秀教材《概率论 
与数理统计》（第5版 ， Thomson Learning ) 等。在努力降低引进教材售价方 
面，高等教育出版社做了大量和细致的工作，这套引进的教材体现了一定的权 
威性、系统性、先进性和经济性等特点。 

通过影印、翻译、编译这批优秀教材，我们一方面要不断地分析、学习、 
消化吸收国外优秀教材的长处，吸取国外出版公司的制作经验，提升我们自编 



教材的立体化配套标准，使我国高校教材建设水平上一个新的 台阶； 与此同 
时，我们还将尝试组织海外作者和国内作者合编外文版基础课数学教材，并约 
请国内专家改编部分国外优秀教材，以适应我国实际教学环境。 

这套教材出版后，我们将结合各高校的双语教学计划，开展大规模的宣 
传、培训工作，及时地将本套丛书推荐给高校使用。在使用过程中，我们衷心 
希望广大高校教师和同学提出宝贵的意见和建议，如有好的教材值得引进，请 
与高等教育出版社高等理科分社联系。 

联系电话： 010-58581384， E - mail ： xuke @ hep . com . cn 。 

高等教育出版社 
2004年4月20曰 
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Preface 


When Allen T. Craig died in late November 1978, I lost my advisor, 
mentor, colleague, and very dear friend. Due to his health, Allen did 
nothing on the fourth edition and, of course, this revision is mine alone. 
There is, however, a great deal of Craig’s influence in this book. As a 
matter of fact, when I would debate with myself whether or not to 
change something, I could hear Allen saying, “It’s very good now, Bob; 
don’t mess it up.” Often, I would follow that advice. 

Nevertheless, there were a number of things that needed to be done. 
I have had many suggestions from my colleagues at the University of 
Iowa; in particular, Jim Broffitt, Jon Cryer, Dick Dykstra，Subhash 
Kochar (a visitor), Joe Lang, Russ Lenth, and Tim Robertson 
provided me with a great deal of constructive criticism. In addition, 
three reviewers suggested a number of other topics to include. I have 
also had statisticians and students from around the world write to me 
about possible improvements. Elliot Tanis, my good friend and 
co-author of our Probability and Statistical Inference, gave me 
permission to use a few of the figures, examples, and exercises used in 
that book. I truly thank these people, who have been so helpful and 
generous. 

Clearly, I could not use all of these ideas. As a matter of fact, I 
resisted adding “real” problems, although a few slipped into the 
exercises. Allen and I wanted to write about the mathematics of 
statistics, and I have followed that guideline. Hopefully, without those 
problems, there is still enough motivation to study mathematical 
statistics in this book. In addition, there are a number of excellent 
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books on applied statistics, and most students have had a little 
exposure to applications before studying this book. 

The major differences between this edition and the preceding one 
are the following: 

• There is a better discussion of assigning probabilities to events, 
including introducing independent events and Bayes’ theorem in the 
text. 

• The consideration of random variables and their expectations is 
greatly improved. 

• Sufficient statistics are presented earlier (as was true in the very 
early editions of the book), and minimal sufficient statistics are 
introduced. 

• Invariance of the maximum likelihood estimators and invariant 
location- and scale-statistics are considered. 

• The expressions “convergence in distribution” and “convergence in 
probability” are used, and the delta method for finding asymptotic 
distributions is spelled out. 

• Fisher information is given, and the Rao-Cramer lower bound is 
presented for an estimator of a function of a parameter, not just for 
an unbiased estimator. 

• The asymptotic distribution of the maximum likelihood estimator 
is included. 

• The discussion of Bayesian procedures has been improved and 
expanded somewhat. 

There are also a number of little items that should improve the 
understanding of the text: the expressions var and cov are used; the 
convolution formula is in the text; there is more explanation of 
/^-values; the relationship between two-sided tests and confidence 
intervals is noted; the indicator function is used when helpful; the 
multivariate normal distribution is given earlier (for those with an 
appropriate background in matrices, although this is still not necessary 
in the use of this book); and there is more on conditioning. 

I believe that the order of presentation has been improved; in 
particular, sufficient statistics are presented earlier. More exercises 
have been introduced; and at the end of each chapter, there are several 
additional exercises that have not been ordered by section or by 
difficulty (several students had suggested this). Moreover, answers 
have not been given for any of these additional exercises because I 
thought some instructors might want to use them for questions on 
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xi 


examinations. Finally, the index has been improved greatly, another 
suggestion of students as well as of some of my colleagues at Iowa. 

There is really enough material in this book for a three-semester 
sequence. However, most instructors find that selections from the first 
five chapters provide a good one-semester background in the 
probability needed for the mathematical statistics based on selections 
from the remainder of the book, which certainly would include most 
of Chapters 6 and 7. 

I am obligated to Catherine M. Thompson and Maxine Merrington 
and to Professor E. S. Pearson for permission to include Tables II and 
V, which are abridgments and adaptations of tables published in 
Biometrika. I wish to thank Oliver & Boyd Ltd., Edinburgh, for 
permission to include Table IV, which is an abridgment and adaptation 
of Table III from the book Statistical Tables for Biological' 
Agricultural, and Medical Research by the late Professor Sir Ronald A. 
Fisher, Cambridge, and Dr. Frank Yates ， Rothamsted. 

Finally，I would like to dedicate this edition to the memory of Allen 
Craig and my wife, Carolyn, who died June 25, 1990. Without the love 
and support of these two caring persons, I could not have done as much 
professionally as I have. My friends in Iowa City and my children 
(Mary, Barbara ， Allen, and Robert) have given me the strength to 
continue. After four previous efforts, I really hope that I have come 
close to “getting it right this fifth time." I will let the readers be the 
judge. 


R. V. H. 




CHAPTER 


Probability and 
Distributions 


1.1 Introduction 

Many kinds of investigations may be characterized in part by 
the fact that repeated experimentation, under essentially the same 
conditions, is more or less standard procedure. For instance, in medical 
research, interest may center on the effect of a drug that is to be 
administered; or an economist may be concerned with the prices of 
three specified commodities at various time intervals; or the 
agronomist may wish to study the effect that a chemical fertilizer has 
on the yield of a cereal grain. The only way in which an investigator 
can elicit information about any such phenomenon is to perform his 
experiment. Each experiment terminates with an outcome. But it 
is characteristic of these experiments that the outcome cannot be 
predicted with certainty prior to the performance of the experiment. 

Suppose that we have such an experiment, the outcome of which 
cannot be predicted with certainty, but the experiment is of such a 
nature that a collection of every possible outcome can be described 
prior to its performance. If this kind of experiment can be repeated 
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under the same conditions, it is called a random experiment, and the 
collection of every possible outcome is called the experimental space 
or the sample space. 

Example 1. In the toss of a coin, let the outcome tails be denoted by T and 
let the outcome heads be denoted by H- If we assume that the coin may be 
repeatedly tossed under the same conditions, then the toss of this coin is an 
example of a random experiment in which the outcome is one of the two 
symbols T and H; that is, the sample space is the collection of these two 
symbols. 

Example 2. In the cast of one red die and one white die, let the outcome 
be the ordered pair (number of spots up on the red die, number of spots up 
on the white die). If we assume that these two dice may be repeatedly cast 
under the same conditions, then the cast of this pair of dice is a random 
experiment and the sample space consists of the following 36 ordered pairs: 
( 1 , 1 ),..., ( 1 , 6 ), ( 2 , 1 ),...,( 2 , 6 ),..., ( 6 , 6 ). " ‘ 

Let 贫 denote a sample space, and let C represent a part of If, 
upon the performance of the experiment, the outcome is in C, we shall 
say that the event C has occurred. Now conceive of our having made 
N repeated performances of the random experiment. Then we can 
count the number / of times (the frequency) that the event C actually 
occurred throughout the N performances. The ratio f/N is called the 
relative frequency of the event C in these N experiments. A relative 
frequency is usually quite erratic for small values of N, as you can 
discover by tossing a coin. But as N increases, experience indicates that 
we associate with the event C a number, say p, that is equal or 
approximately equal to that number about which the relative 
frequency seems to stabilize. If we do this, then the number p can be 
interpreted as that number which, in future performances of the 
experiment, the relative frequency of the event C will either equal or 
approximate. Thus, although we cannot predict the outcome of a 
random experiment, we can, for a large value of N, predict 
approximately the relative frequency with which the outcome will be 
in C. The number p associated with the event C is given various names. 
Sometimes it is called the probability that the outcome of the random 
experiment is in C; sometimes it is called the probability of the event 
C; and sometimes it is called the probability measure of C. The context 
usually suggests an appropriate choice of terminology. 

Example 3. Let 货 denote the sample space of Example 2 and let C be the 
collection of every ordered pair of 贫 for which the sum of the pair is 
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equal to seven. Thus C is the collection (1 ， 6) ，（ 2, 5) ，（ 3, 4) ，（ 4, 3) ，（ 5, 2)，and 
(6, 1). Suppose that the dice are cast N = 4CK) times and let /, the frequency 
of a sum of seven, be/= 60. Then the relative frequency with which the 
outcome was in C is fjN = 晶 = 0.15. Thus we might associate with C a 
number p that is close to 0.15, and p would be called the probability of the 
event C. 

Remark. The preceding interpretation of probability is sometimes 
referred to as the relative frequency approach, and it obviously depends upon 
the fact that an experiment can be repeated under essentially identical 
conditions. However, many persons extend probability to other situations by 
treating it as a rational measure of belief. For example, the statement | 
would mean to them that their personal or subjective probability of the event 
C is equal to f. Hence, if they are not opposed to gambling, this could be 
interpreted as a willingness on their part to bet on the outcome of C so that 
the two possible payoffs are in the ratio p/(\ — p) = \/\ = f. Moreover, if they 
truly believe that p = § is correct, they would be willing to accept either side 
of the bet: (a) win 3 units if C occurs and lose 2 if it does not occur, or (b) 
win 2 units if C does not occur and lose 3 if it does. However, since the 
mathematical properties of probability given in Section 1.3 are consistent with 
either of these interpretations, the subsequent mathematical development 
does not depend upon which approach is used. 

The primary purpose of having a mathematical theory of statistics 
is to provide mathematical models for random experiments. Once a 
model for such an experiment has been provided and the theory worked 
out in detail, the statistician may, within this framework, make 
inferences (that is, draw conclusions) about the random experiment. 
The construction of such a model requires a theory of probability. 
One of the more logically satisfying theories of probability is that 
based on the concepts of sets and functions of sets. These concepts 
are introduced in Section 1.2. 

1.2 Set Theory 

The concept of a set or a collection of objects is usually left 
undefined. However, a particular set can be described so that there is 
no misunderstanding as to what collection of objects is under 
consideration. For example, the set of the first 10 positive integers is 
sufficiently well described to make clear that the numbers \ and 14 are 
not in the set, while the number 3 is in the set. If an object belongs to 
a set, it is said to be an element of the set. For example, if A denotes 
the set of real numbers x for which 0 < jc < 1, then \ is an element of 



4 


Probability and Distributions |Ch. .1 


the set A. The fact that \ is an element of the set A is indicated by 
writing A. More generally, ae A means that a is an element of the 
set A. 

The sets that concern us will frequently be sets of numbers. 
However, the language of sets of points proves somewhat more 
convenient than that of sets of numbers. Accordingly, we briefly in¬ 
dicate how we use this terminology. In analytic geometry consider¬ 
able emphasis is placed on the fact that to each point on a line (on which 
an origin and a unit point have been selected) there corresponds one 
and only one number, say x; and that to each number x there 
corresponds one and only one point on the line. This one-to-one 
correspondence between the numbers and points on a line enables us 
to speak, without misunderstanding, of the “point x” instead of the 
“number jc.” Furthermore, with a plane rectangular coordinate system 
and with a: and 少 numbers, to each symbol (x, y) there corresponds one 
and only one point in the plane; and to each point in the plane there 
corresponds but one such symbol. Here again, we may speak of the 
“point ( 文， j)，” meaning the “ordered number pair x and >».’，This 
convenient language can be used when we have a rectangular 
coordinate system in a space of three or more dimensions. Thus the 
“point (X| , jc 2 , . .., x")’’ means the numbers ;^ ， ； c 2 , ... ， x" in the order 
stated. Accordingly, in describing our sets, we frequently speak of a set 
of points (a set whose elements are points), being careful, of course, to 
describe the set so as to avoid any ambiguity. The notation 
^4 = {jc : 0 < jc < 1} is read "A is the one-dimensional set of points jc 
for which 0 < ;c S 1.” Similarly, A = {(jc, 少） ： 0 < jc < 1 ， 0 S 
>» < 1} can be read is the two-dimensional set of points (jc, _v) that 
are interior to, or on the boundary of, a square with opposite vertices 
at (0,0) and (1 ， 1).’’ We now give some definitions (together with 
illustrative examples) that lead to an elementary algebra of sets 
adequate for our purposes. 

Definition 1. If each element of a set J , is also an element of set A 2 , 
the set is called a subset of the set A 2 . This is indicated by writing 
Ai c A 2 . If ^4| cz A 2 and also A 2 c A,, the two sets have the same 
elements, and this is indicated by writing A, = A 2 . 

Example 1. Let ^4, = {jc : 0 < x ^ 1} and A 2 = {x\ — 1 ^ x ^ 2}. Here 
the one-dimensional set A, is seen to be a subset of the one-dimensional set 
A 2 ; that is, v4, cz A 2 . Subsequently, when the dimensionality of the set is clear, 
we shall not make specific reference to it. 
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Example 2. Let A, = {(jc, 少 ）：Occ = jy< 1} and A 2 = {(a ：, j) : 
0 < x < 1,0 < 1}. Since the elements of A, are the points on one diagonal 

of the square, then A, c A 2 . 

Definition 2. If a set 4 has no elements, A is called the null set. This 
is indicated by writing A = 0. 

Definition 3. The set of all elements that belong to at least one of 
the sets A, and A 2 is called the union of A t and A 2 . The union of A, and 
A 2 is indicated by writing A, u A 2 . The union of several sets 
A,, A 2 , 為， ... is the set of all elements that belong to at least one of 
the several sets. This union is denoted by or by 

A t u A 2 u •' • u A k if a finite number k of sets is involved. 

Example 3. Let >4, = {jc : jc = 0, 1,..., 10} and >4 2 = {x: jc = 8,9,10, 11, 
or 11 < jc < 12}. Then u y4 2 = {jc : jc = 0, 1,..., 8,9, 10, 11, or 11 < 
jc < 12} = {jc : jc = 0, 1,..., 8, 9, 10, or 11 < jc < 12}. 

Example 4. Let A, and A 2 be defined as in Example 1. Then A,u A 2 = A 2 . 

Example 5. Let A 2 = 0. Then A t u A 2 = A, for every set 

Example 6. For every set A, A u A = A. 

Example 7. Let 

A ^{ x: irr ^ x ^ 1 

Then A { kj A 2 ^ A 3 kj • ■ • = {x :0 < x < 1}. Note that the number zero is not 
in this set, since it is not in one of the sets A,, A 2 , .... 

Definition 4. The set of all elements that belong to each of the sets 
A, and A 2 is called the intersection of A, and A 2 . The intersection of 
A x and A 2 is indicated by writing A, n A 2 . The intersection of several 
sets A 2 , A 3 , ... is the set of all elements that belong to each of the 
sets A u A 2 , A 3 , .... This intersection is denoted byAinA 2 r\A 3 r\''- 
or by A t n A 2 n ■ ^ • n A k if a finite number k of sets is involved. 

Example 8. Let A, = {(0, 0), (0, 1), (1, 1)} and A 2 = {(1, 1), (1, 2), (2, 1)}. 
Then A, nA 2 = {(1, 1)}. 

Example 9. Let A, = {(jc, >^):0<x + _v< 1} and A 2 = {(x, j): 1 < 
jc + j}. Then A, and A 2 have no points in common and A,r)A 2 = 0. 

Example 10. For every set A, A n A = A and A n 0 = 0. 

Example 11. Let 



A k = 


jc : 0 < jc < - 
k 


k — 1 ? * • • • 
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u A 2 i4jn a, 

FIGURE 1.1 


Then A t n A 2 n A 3 • ■ ■ the null set, since there is no point that belongs to 
each of the sets A,, A 2 , A 3 . 

Example 12. Let A, and A 2 represent the sets of points enclosed, respect¬ 
ively, by two intersecting circles. Then the sets A { u A 2 and A, n A 2 are 
represented, respectively, by the shaded regions in the Venn diagrams in 
Figure 1.1. 

Example 13. Let A,, A 2 , and A } represent the sets of points enclosed, 
respectively, by three intersecting circles. Then the sets (A, u A 2 ) n A 3 and 

n A 2 ) u A 3 are depicted in Figure 1.2. 

Definition 5. In certain discussions or considerations, the totality 
of all elements that pertain to the discussion can be described. This set 
of all elements under consideration is given a special name. It is called 
the space. We shall often denote spaces by capital script letters such as 
d and 

Example 14. Let the number of heads, in tossing a coin four times, be 
denoted by x. Of necessity, the number of heads will be one of the numbers 
0, 1, 2, 3, 4. Here, then, the space is the set si = {0, I, 2, 3, 4}. 



(A, u a 2 ) n A i (A, n a 2 ) u a 3 

FIGURE 1.2 
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Example 15. Consider all nondegenerate rectangles of base x and height 
y. To be meaningful, both x and 少 must be positive. Thus the space is the set 

i = {(jc, >0 :文 > 0, > 0}. 

Definition 6. Let ^ denote a space and let be a subset of the set 
The set that consists of all elements of / that are not elements of 
A is called the complement of A (actually, with respect to s^). The 
complement of A is denoted by A*. In particular, = 0. 

Example 16. Let .e/ be defined as in Example 14, and let the set A = {0, 1}. 
The complement of A (with respect to j/) is A* = {2, 3,4}. 

Example 17. Given A <=. si. Then Au A* = .g/, A n A* = 0, 
A'o — si, A r\ = A, and (/4*)* = A. 

In the calculus, functions such as 、 

J { x ) — 2 x , —oo < x < oo , 

or 

y ) = e ~ K ~\ 0 < x < oo , 0 < ^ < ao , 

= 0 elsewhere, 

or possibly 

h ( x ', x 2 , .. ., x „) = 3jC|JC 2 ' ■ x„, 0 ^ Xj <, 1 , / = 1, 2, …， n, 

= 0 elsewhere, 

were of common occurrence. The value of f(x) at the “point x = 1” is 
/(1) = 2; the value of g(x, y ) at the “point (—1,3)” isg (— 1, 3) = 0; the 
value of h(x t ' ， x 2 ,..., x„) at the “point (1,1,..., 1)'' is 3. Functions 
such as these are called functions of a point or, more simply, point 
functions because they are evaluated (if they have a value) at a point 
in a space of indicated dimension. 

There is no reason why, if they prove useful, we should not have 
functions that can be evaluated, not necessarily at a point, but for an 
entire set of points. Such functions are naturally called functions of a 
set or, more simply, set functions. We shall give some examples of set 
functions and evaluate them for certain simple sets. 

Example 18. Let be a set in one-dimensional space and let Q(A) be equal 
to the number of points in A which correspond to positive integers. Then Q(A) 
is a function of the set A. Thus, if /4 = {x : 0 < x < 5}, then Q(A) = 4; if 
A = {—2, — 1}, then Q(A) — 0; if /I = {x : — oo < x < 6}» then Q(A) = 5. 

Example 19. Let be a set in two-dimensional space and let Q(A) be the 
area of A, if A has a finite area; otherwise, let Q(A) be undefined. Thus, if 
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A = {(x, + 1}, then 0(/0 = n; if A = {(0,0), (1, 1), (0, 1)}, then 

Q(A) = 0; if /I = {(x, j): 0 < x, 0 ^ x + ^ ^ 1}, then Q(A) = j. 

Example 20. Let /4 be a set in three-dimensional space and let Q(A) be 
the volume of A, if A has a finite volume; otherwise, let Q(A) be undefined. 
Thus, i{ A = {(x, y, z) :0 < x ^2,0 ^ y < 1,0 < z ^ 3}, then Q(A) = 6; if 
A = {(jc, y, z): x 1 + y 1 z 1 >. \), then Q{A) is undefined. 

At this point we introduce the following notations. The symbol 

A 

Ax)dx 

will mean the ordinary (Riemann) integral of f(x) over a prescribed 

one-dimensional set A; the symbol 

/• /• 

g(x, y) dx dy 

will mean the Riemann integral of g(x, y ) over a prescribed 
two-dimensional set A; and so on. To be sure, unless these sets A and 
these functions f{x) and 发 (x ， 少 ） are chosen with care, the integrals 
will frequently fail to exist. Similarly, the symbol 

lAx) 

A 

will mean the sum extended over all xsA; the symbol 

X 1>(太， 少） 

A 

will mean the sum extended over all (x 9 y) e A; and so on. 

Example 21. Let >1 be a set in one-dimensional space and let 
0U) = L Ax), where 

A 

/(JC) = (|)' x=l ， 2,3 ，...， ’ 

= 0 elsewhere. 

If = {x : 0 ^ x < 3}, then 

Q(^) = 5 + (i) 2 + (j) J = l- 
Example 22. Let Q(A).= /(x), where 

A 

/(x) =/^(l -p)' \ X = 0 ， 1, 

= 0 elsewhere. 

U A — {0}，then 

Q(A)= £ /^(l -p)'~ x = 1 -p ； 
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if ^ = {x : 1 < x < 2}, then Q(A) = /(l) = p. 

Example 23. Let J be a one-dimensional set and let 

Q(A) — e~ x dx. 

Thus, if /4 = {x : 0 ^ x < oo}, then 


Q(A) 


e~ x dx = 


if = {x : 1 < x < 2}, then 

|»2 

Q(A) = e~ x dx = e~' - e~ 2 ; 
if y4, = {x : 0 ^ x < 1} and A 2 = {x : l < x ^ 3}, then 

/ *3 

Q(A t u /4 2 ) = e~ x dx 

/•l 广 3 

= e~ x dx + e~ x dx 
J 0 • -i\ 

=Q(A^) + Q(A 2 ); 

if A = A t u A 2 , where ^4| = {x : 0 < a: < 2} and A 2 = {x: \ < x ^ 3}, then 

广 3 

Q(A) = Q(A t vj A 2 ) = e~ x dx 

. 4) 

= f e~ x dx + [ e~ x dx — f e~ x dx 


= Q(A i ) + Q(A 2 )-Q(A i n A 2 ). 
Example 24. Let 4 be a set in n-dimensional space and let 




^ A 


If A = {(x_ ， x 2 ,..., x„) : 0 ^ jc, ^ x 2 ^ ^ ^ 1}, then 


Q(A) 



dx 、 dx 2 - - - dx„_ t dx„ 


where n\ = n(n — 1) … 3 • 2.1. 
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EXERCISES 

1.1. Find the union A x u A 2 and the intersection A x n A 2 of the two sets A\ 
and A 2 , where: 

(a) /l, = {0,1,2}, A 2 = {2, 3,4}. 

(b) = {x : 0 < x < 2}, /4 2 = {jc : 1 < x < 3}. 

(c) A, = {(x, j): 0 < x < 2, 0 < ^ < 2}, 

A 2 = {(x, >») : 1 < x < 3, 1 < >> < 3}. 

1.2. Find the complement A* of the set A with respect to the space si if: 

(a) = {x : 0 < x < 1}, = {x : § < x < 1}. 

(b) = {(x, y, z): x 2 + y 2 + z 2 < \}, A = {(x, y, z): x 2 + y 2 + z 2 = 1}. 

(c) ^ = {(x, y) : |x| + \y\ <2} y A = {(x, y):x 2 + y 2 < 2}. 

1.3. List all possible arrangements of the four letters m, a, r, and y. Let A, 
be the collection of the arrangements in which y is in the last position. Let 
A 2 be the collection of the arrangements in which m is in the first position. 
Find the union and intersection of A { and A 2 . 

1.4. By use of Venn diagrams, in which the space is the set of points 

enclosed by a rectangle containing the circles, compare the following sets: 
(s) o (^2 A}) snd o w (/4| o <4j). 

(b) Ai^j(A 2 n Ai) and (A t u /4 2 )n (/4, u A 3 ). 

(c) (A } vj A 2 )* and n A\. 

(d) (/4, n /4 2 )* and A* x vj A\. 

1.5. If a sequence of sets A h A 2 , Aj,. .. is such that A k c A k+U 

k = 1, 2, 3, ... ， the sequence is said to be a nondecreasing sequence. Give 
an example of this kind of sequence of sets. 

1.6. If a sequence of sets A,, A 2 , A 3 ,... is such that A k ] A k + t , 

k = 1 ， 2, 3, ... ， the sequence is said to be a nonincreasing sequence. Give 
an example of this kind of sequence of sets. 

1.7. 1( A t , A 2 , A 3 ,... are sets such that A k (=. A k + ] , k = 1 ， 2, 3, ... ， Hm A k 

is defined as the union A 2 <j . Find lim A k if: 上 ' 

k-*oo 

(a) = {jc : < a: 幺 3 - \/k}, k= 1,2,3 . 

(b) A k = {(x, y):\jk < x 1 + y 1 <A - \/k), k = 1, 2, 3. 

1.8. If A u A 2 , A 3 ,... are sets such that A k A k + l , k = 1, 2, 3,, lim A k 

is defined as the intersection A f nA 2 nA } r\---. Find lim A k if: 00 

k 一 ! x> 

(a) A k = {x: 2 — l/k < x <2}, k = \,2,3,.... 

(b) A k = {x : 2 < x < 2 + l/k}, /r = 1,2, 3, - 

(c) A k = {(x, 少） ： 0 < jic 2 + / < 1/^r}, A: = 1,2, 3,.... 
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1.9. For every one-dimensional set A, let Q(A) = ^ J{x), where^x) = 

A 

x = 0, 1, 2,..., zero elsewhere. If /4, = {a: : jc = 0, 1, 2, 3} and 
A 2 = {x : x = 0, \,2,.. .}, find Q(A,) and Q(A 2 ). 

Hint: Recall that S„ = a + ar + • ■ ■ + ar"~ ' = a(l — r")/( 1 — r) and 
lim S„ = a/( 1 — r) provided that |r| < 1. 

fl — oo 

1.10. For every one-dimensionai set A for which the integral exists, let 
Q{A) = j A Ax)dx, where f[x)= : 6 jc (1 — x), 0 < jc < 1 , zero elsewhere; 
otherwise, let Q(A) be undefined. If ^, = {x : | < a: < |}, A 2 = { 5 }, and 

= {x : 0 < x < 10}，find Q(A{), Q(A 2 ), and Q(A 3 )： 

1.11. Let Q(A) = 1^1 (x 2 + y 1 ) dx dy for every two-dimensional set A for 
which the integral exists; otherwise, let Q(A) be undefined. If 

= {(JT, y ) : -1 < x < 1, -1 ^ 1}, A 2 = {(x, y) : -1 <x =>- < 1}, 

and A 3 = {(x, y):x 2 + f< 1}, find Q(A t ), Q(A 2 ), and Q(A 3 ). 

Hint: In evaluating Q(A 2 ), recall the definition of the double integral (or 
consider the volume under the surface z = x 2 + y 2 above the line segment 
— 1 < x = y < 1 in the x>»-plane). Use polar coordinates in the calculation 
of Q(Ay). 

1.12. Let denote the set of points that are interior to, or on the boundary 
of, a square with opposite vertices at the points (0,0) and (1, 1). Let 

Q ⑷ = \ A \ d y dx - 

(a) If A <= s/ is the set {(jc, >»): 0 < jc < >» < 1}, compute Q(A). 

(b) If A cz s/ is the set {(x, j):0<x=y< 1}, compute Q{A). 

.(c) If A cz s/ is the set {(x, j): 0 < xj2 幺少 < 3x/2 < 1}, compute Q(A). 

1.13. Let s/ be the set of points interior to or on the boundary of a cube with 
edge of length 1. Moreover, say that the cube is in the first octant with one 
vertex at the point (0, 0, 0) and an opposite vertex at the point (I, 1,1). Let 
Q(A) = f JJ dx dy dz. 

(a) If c ： Y is the set {(x, y,z)\Q < x < y < z < 1}, compute Q(A). 

(b) If A is the subset {(x, y, z) : 0 < x = y = z < 1}, compute Q(A). 

1.14. Let A denote the set {(x, y, z):^ + y 2 + z 7 < 1}. Evaluate 
Q(A) = JJJ ^Jx 2 + y 2 + z 2 dx dy dz. 

Hint: Use spherical coordinates. 

1.15. To join a certain club, a person must be either a statistician or a 
mathematician or both. Of the 25 members in this club, 19 are statisticians 
and 16 are mathematicians. How many persons in the club are both a 
statistician and a mathematician? 
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1.16. After a hard-fought football game, it was reported that, of the 11 
starting players, 8 hurt a hip, 6 hurt an arm, 5 hurt a knee, 3 hurt both a 
hip and an arm, 2 hurt both a hip and a knee, 1 hurt both an arm and a 
knee, and no one hurt all three. Comment on the accuracy of the report. 


1.3. The Probability Set Function 

Let ^ denote the set of every possible outcome of a random 
experiment; that is, ^ is the sample space. It is our purpose to define 
a set function P{C) such that if C is a subset of 雀 ， then P{C) is the 
probability that the outcome of the random experiment is an element 
of C. Henceforth it will be tacitly assumed that the structure of each 
set C is sufficiently simple to allow the computation. We have already 
seen that advantages accrue if we take P(C) to be that number about 
which the relative frequency f/N of the event C tends to stabilize after 
a long series of experiments. This important fact suggests some of the 
properties that we would surely want the set function P(C) to possess. 
For example, no relative frequency is ever negative; accordingly, we 
would want P{C) to be a nonnegative set function. Again, the relative 
frequency of the whole sample space ^ is always 1. Thus we would want 
P(^) = 1. Finally, if C )5 C 2 , C 3 ,... are subsets of ^ such that no two 
of these subsets have a point in common, the relative frequency of the 
union of these sets is the sum of the relative frequencies of the sets, and 
we would want the set function P(C) to reflect this additive property. 
We now formally define a probability set function. 


Definition 7. If P(C) is defined for a type of subset of the space 贫， 
and if 

(a) P(Q > 0, 

(b) P(C l uC 2 uC 3 u---) = P(Q) + P(C 2 ) + P(C } ) + ... ， where 
the sets C„ i = 1 ， 2, 3, …， are such that no two have a point 
in common (that is, where C ( n C } = 0, / ^ J), 

(c) pm = l, 

then P is called the probability set function of the outcome of the 
random experiment. For each subset C 1 of 贫 ， the number P(C) is called 
the probability that the outcome of the random experiment is an 
element of the set C, or the probability of the event C, or the probability 
measure of the set C. 
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A probability set function tells us how the probability is dis¬ 
tributed over various subsets C of a sample space 贫 .In this sense we 
speak of a distribution of probability. 

Remark. In the definition, the phrase "a type of subset of the space 贫 ” 
refers to the fact that P is a probability measure on a sigma field of subsets 
of ^ and would be explained more fully in a more advanced course. 
Nevertheless, a few observations can be made about the collection of subsets 
that are of the type. From condition (c) of the definition, we see that the space 
^ must be in the collection. Condition (b) implies that if the sets C,, C 2 , C 3 ,... 
are in the collection, their union is also one of that type. Finally, we observe 
from the following theorems and their proofs that if the set C is in the 
collection, its complement must be one of those subsets. In particular, the null 
set, which is the complement of must be in the collection. 

The following theorems give us some other properties of a 
probability set function. In the statement of each of these theorems, 
P(C) is taken, tacitly, to be a probability set function defined for a 
certain type of subset of the sample space 贫 . 

Theorem 1. For each C c P(Q = 1 — P(C*). 

Proof. We have ^ = CkjC* and Cn C* = 0. Thus, from (c) and 
(b) of Definition 7, it follows that 

1 = P(C) + P(C% 
which is the desired result. 

Theorem 2. The probability of the null set is zero', that is, P(0) = 0. 

Proof. In Theorem 1, take C = 0 so that C* = Accordingly, we 
have 

P(0) = 1 - 尸⑻ = 1-1=0, 
and the theorem is proved. 

Theorem 3. If C' and C 2 are subsets of ^ such that C, c= C 2 , then 
P(C,) ^ P(C 2 ). 

Proof. Now C 2 = C, u (C 1 ? n C 2 ) and Cj n (C? n C 2 ) = 0. Hence, 
from (b) of Definition 7 ， 

P{C 2 )^P{C { ) + P{C* x nC 2 ). 
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However^.from ⑷ of Definition.: 7 ， P(C1[ n C 2 ) > 0; accordingly, 
P(C 2 ) > P(C { ). 

Theorem 4. For each C c; 0 < P(C) < 1. 

Proof. Since 0 c C c= we have by Theorem 3 that 
P(0) < P(C)< Pm or 0</>(O^1, 
the desired result. 

TYieorem 5. If C : and C 2 are subsets of then 

P(C, u C 2 ) = / >(C,) + P(C 2 ) - />(C, n C 2 ). 

Proof. Each of the sets C, u C 2 and C 2 can be represented, 
respectively, as a union of nonintersecting sets as follows: 

C| u C 2 = C! u (C^ n C 2 ) and C 2 - (C, n C 2 ) u (Ct n C 2 ). 

Thus, from (b) of Definition 7, 

P(C 1 uC 2 )=P(C 1 ) + / > (C ： nC 2 ) 

and 

P(C 2 ) = P(C } n C 2 ) + P(CT n C 2 ). 

If the second of these equations is solved for P(Cf n C 2 ) and this result 
substituted in the first equation, we obtain 

P(C, u C 2 ) = P(C t ) + P(C 2 ) - F(C, n C 2 ). 

This completes the proof. 


Example 1. Let 嘗 denote the sample space of Example 2 of Section 1 . 1 . 
Let the probability set function assign a probability of 去 to each of the 36 
points in If C, = {(1, 1), (2, 1),(3, 1), (4,1), (5,1)} and C 2 = {(1, 2), (2, 2), 
(3, 2)}, then P(C,)=^, P(C 2 ) = ^ P(C,uC 2 ) = ^, and P(C,nC 2 ) = 0. 

Example 2. Two coins are to be tossed and the outcome is the ordered 
pair (face on the first coin, face on the second coin). Thus the sample 
space may be represented as ( € = {(H, H), (H, T), (T, H), (T, T)}. Let the 
probability set function assign a probability of ^ to each element of 贫 . Let 
C, = {(H,H),(H,T)} and C 2 = {(H, H), (T, H)}. Then P(C,) = P(C 2 ) = i, 
P(C t n C 2 ) = 5 , and, in accordance with Theorem 5, P(C\ u C 2 )= 

2 ~ 2 4 4 - 
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Let 贫 denote a sample space and let C,, C 2 , C 3 ,... denote subsets 
of If these subsets are such that no two have an element in 
common, they are called mutually disjoint sets and the corresponding 
events C h C 2 , C 3 ,... are said to be mutually exclusive events. Then, 
for example, P(C, u C 2 u C 3 u • ■ •) = ^(C,) 4- P(C 2 ) + P(C 3 ) + …， 
in accordance with (b) of Definition 7. Moreover, if ^ = 
C, u C 2 u C 3 u ■ ■ •, the mutually exclusive events are further 
characterized as being exhaustive and the probability of their union is 
obviously equal to 1. 

Let ^ be partitioned into k mutually disjoint subsets C,, C 2 ,..., Q 
in such a way that the union of these k mutually disjoint subsets is 
the sample space Thus the events C,, C 2 ,..., C A arc mutually 
exclusive and exhaustive. Suppose that the random experiment is 
of such a character that it is reasonable to assume that each of 
the mutually exclusive and exhaustive events C„ i = \,2,... ,k, 
has the same probability. It is necessary, then, that P(C,) = 1 jk, 
i = 1,2,... ,k; and we often say that the events C t , C 2 ,..., C A are 
equally likely. Let the event E be the union of r of these mutually 
exclusive events, say 


Then 


£" = C, u C 2 u • • ■ u C r , r <k. 

P(E) = P(C i ) + P(C 2 ) 十 •■十 P(C r )= - . 


Frequently, the integer k is called the total number of ways (for this 
particular partition of in which the random experiment can 
terminate and the integer r is called the number of ways that are 
favorable to the event E. So, in this terminology, P(E) is equal to the 
number of ways favorable to the event E divided by the total number 
of ways in which the experiment can terminate. It should be 
emphasized that in order to assign, in this manner, the probability rjk 
to the event E, we must assume that each of the mutually exclusi ve and 
exhaustive events C,, C 2 ,..., C k has the same probability \Jk. This 
assumption of equally likely events then becomes a part of our 
probability model. Obviously, if this assumption is not realistic in an 
application, the probability of the event E cannot be computed in this 
way. 

We next present an example that is illustrative of this model. 

Example 3. Let a card be drawn at random from an ordinary deck of 
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52 playing cards. The sample space 贫 is the union of ^ = 52 outcomes, and it is 
reasonable to assume that each of these outcomes has the same probability 
士 . Accordingly, if E { is the set of outcomes that are spades, P{E X ) = ^ = i 
because there are r, = 13 spades in the deck; that is, \ is the probability of 
drawing a card that is a spade. If E 2 is the set of outcomes that are kings, 
P(E 2 ) = ^ 2 = i- i because there are r 2 = 4 kings in the deck; that is, ^ is the 
probability of drawing a card that is a king. These computations are very easy 
because there are no difficulties in the determination of the appropriate values 
of r and 灸 . However, instead of drawing only one card, suppose that five cards 
are taken, at random and without replacement, from this deck. We can think 
of each five-card hand as being an outcome in a sample space. It is reasonable 
to assume that each of these outcomes has the same probability. Now 
if E\ is the set of outcomes in which each card of the hand is a spade, 
P{E\ ) is equal to the number r, of all spade hands divided by the total number, 
say k, of five-card hands. It is shown in many books on algebra that 



and 




52! 
5! 47! 


In general, if n is a positive integer and if x is a nonnegative integer with x <, n, 
then the binomial coefficient 


3 


n\ 


x! (n — x)! 

is equal to the number of combinations of n things taken x at a time. Ifx = 0, 
0! = 1, so that = 1. Thus, in the special case involving 


尸⑹ 


⑺ 


(13)(12)(11)(10)(9) 
52\~ (52)(51)(50)(49)(48) 


⑺ 


0.0005, 


approximately. Next, let E 2 be the set of outcomes in which at least one card 
is a spade. Then is the set of outcomes in which no card is a spade. There 


are 


39 、 


such outcomes. Hence 


P(E* 2 ) 


0 


/ 52 \ 


and P(E 2 ) = 1 - P(E^). 
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Now suppose that E 3 is the set of outcomes in which exactly three cards are 
kings and exactly two cards are queens. We can select the three kings in 

any one of d ways and the two queens in any one of (;) ways. By a well- 


known counting principle, the number of outcomes in £ 3 is r 3 


Thus PiEi) 



m- 


. Finally, let E 4 be the set of outcomes in which 


there are exactly two kings, two queens, and one jack. Then 



because the numerator of this fraction is the number of outcomes in E 4 _ 


Example 3 and the .previous discussion allow us to see one way in 
which we can define a probability set function, that is, a set function 
that satisfies the requirements of Definition 7. Suppose that our space 
贫 consists of k distinct points, which, for this discussion, we take to 
be in a one-dimensional space. If the random experiment that ends in 
one of those k points is such that it is reasonable to assume that these 
points are equally likely, we could assign \/k to each point and let, for 
C c 贫， 

number of points in C 

W = - 1 - 

=x Where J{x) = \, 

jceC K 

For illustration, in the cast of a die, we could take 
贫 ={1, 2, 3,4, 5, 6} and J{x) = 去 ， jc e 贫 ， if we believe the die to be 
unbiased. Clearly, such a set function satisfies Definition 7. 

The word unbiased in this illustration suggests the possibility that 
all six points might not, in all such cases, be equally likely. As a matter 
of fact, loaded dice do exist. In the case of a loaded die, some numbers 
occur more frequently than others in a sequence of casts of that die. 
For example, suppose that a die has been loaded so that the relative 
frequencies of the numbers in ^ seem to stabilize proportional to the 
number of spots that are on the up side. Thus we might assign 
y(x) = x/21, x and the corresponding 


P(Q = I Ax) 
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would satisfy Definition 7. For illustration, this means that if C = 
{1, 2, 3}, then 


P{Q= t 

x= 1 


.fix)= 


12 3 6 

- J_ - = — _ 

21 21 21 ~ 2 \ 


2 

7 


Whether this probability set function is realistic can only be checked 
by performing the random experiment a large number of times. 


EXERCISES 

1.17. A positive integer from one to six is to be chosen by casting a die. Thus 
the elements c of the sample space ^ are 1^2,3,4,5,6. Let C, = {1,2, 3, 4}, 
C 2 = {3,4, 5, 6}. If the probability set function P assigns a probability of 
I to each of the elements of 贫 ， compute P(C ( ), P(C 2 ), P(C, n C 2 ), and 
P(C,uC 2 ). 

1.18 / A random experiment Consists of drawing a card from an ordinary deck 
of 52 playing cards. Let the probability set function P assign a probability 
of j 2 to each of the 52 possible outcomes. Let C, denote the collectijtfn of 
the 13 hearts and let C 2 denote the collection of the 4 kings. Compute P(C { ), 
P(C 2 l />(C,n C 2 ), and P(C { u C 2 ). 

1.19. A coin is to be tossed as many times as necessary to turn up one head. 
Thus the elements c of the sample space 嘗 are H, TH, TTH, TTTH, and 
so forth. Let the probability set function P assign to these elements the 
respective probabilities g, and so forth. Show that P{^) = 1. Let 
C, = {c : c is H, TH, TTH, TTTH, or TTTTH}. Compute P(C t ). Let 
C 2 = {c;c is TTTTH or TTTTTH}. Compute P[C 2 ), P(C t n C 2 ), and 
nC, u C 2 ). 

1.20. If the sample space is ^ = C, u C 2 and if P(C { ) = 0.8 an<j P(C 2 ) = 0.5, 
find 尸 (C, n C 2 ). 

1.21. Let the sample space be^ = {r:0<c< oo}. Let C c be defined by 
C = {c ; 4 < c < ooj and take P(C) = j c e~ x dx. Evaluate P(C), P(C*), and 
P(CuC*). 

1.22. If the sample space is*<f = {c : — oo < c < oo} and if C c ^ is a set for 
which the integral J r dx exists, show that this set function is not a 
probability set function. What constant do we multiply the integral by to 
make it a probability set function? 

1.23. If C, and C 2 are subsets of the sample space show that 

P{C, n C 2 ) < P(C,)< P(C, u C 2 ) < P(C.) + P(C 2 ). 
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1.24. Let C|, C 2 , and C 3 be three mutually disjoint subsets of the sample space 
试 . Find /^(C, u C 2 ) n C 3 ] and P(Cf uCf). 

1.25. If C|, C 2 , and C 3 are subsets of show that 

P{C, uC 2 uC 3 ) = P(C,) + 尸 (C 2 ) + P(C 3 ) - P{C S n C 2 ) 

i 

-P{c x r^C 3 ) - P(C 2 nC 3 ) + ^(C, nC 2 nC 3 ). 

What is the generalization of ; this result to four or more subsets of 贫？ 
Hint: Write P(C { u C 2 u C 3 ) = P[C\ u (C 2 u C 3 )] and use Theorem 5. 

Remark. In order to solve a number of exercises, like 1.26-1.31, certain 
reasonable assumptions must be made. 

1.26. A bowl contains 16 chips, Qf which 6 are red, 7 are white, and 3 are blue. 
If four chips are taken at random and without replacement, find the 
probability that: (a) each of the 4 chips is red; (b) none of the 4 chips is red; 
(c) there is at least 1 chip of each color. 

1.27. A person has purchased 10 of 1000 tickets sold in a certain raffle. To 
determine the five prize winners, 5 tickets are to be drawn at random and 
without replacement. Compute the probability that this person will win at 
least one prize. ,. 

Hint: First compute the probability that the person does not win a prize. 

1.28. Compute the probability of being dealt at random and without 
replacement a 13-card bridge hand consisting of: (a) 6 spades, 4 hearts, 2 
diamonds, and 1 club; (b) 13 cards of the same suit. 

1.29. Three distinct integers are chosen at random from the first 20 positive 
integers. Compute the probability that: (a) their sum is even; (b) their 
product is even. 

1.30. There are 5 red chips and .3 blue chips in a bowl. The red chips are 
numbered 1, 2, 3, 4, 5, respectively, and the blue chips are numbered 1, 2, 
3, respectively. If 2 chips are to be drawn at random and without 
replacement, find the probability that these chips have either the same 
number or the same color. 

1.31. In a lot of 50 light bulbs, there are 2 bad bulbs. An inspector examines 
5 bulbs, which are selected at random and without replacement. 

(a) Find the probability of at least 1 defective bulb among the 5. 

(b) How many bulbs should he examine so that the probability of finding 
at least 1 bad bulb exceeds 5 ? 


1.4 Conditional Probability and Independence 

In some random experiments, we are interested only in those 
outcomes that are elements of a subset C, of the sample space 贫 . This 
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means, for our purposes, that the sample space is effectively the subset 
C|. We are now confronted with the problem of defining a probability 
set function with C, as the “new” sample space. 

Let the probability set function P(C)be defined on the sample space 
贫 and let C, be a subset of ^ such that P(C,) > 0. We agree to consider 
only those outcomes of the random experiment that are elements ofC,; 
in essence, then, we take C, to be a sample space. Let C 2 be another 
subset of 贫 . How, relative to the new sample space C \, do we want to 
define the probability of the event C 2 ? Once defined, this probability 
is called the conditional probability of the event C 2 , relative to the 
hypothesis of the event C,; or, more briefly, the conditional probability 
of C 2 , given C,. Such a conditional probability is denoted by the symbol 
P(C 2 \C^). We now return to the question that was raised about the 
definition of this symbol. Since C\ is now the sample space, the only 
elements of C 2 that concern us are those, if any, that are also elements 
ofC,, that is, the elements of C, n C 2 . It seems desirable, then, to define 
the symbol P{C 2 \C X ) in such a way that 

戶 (CJC 1 ,) = 1 and P(C 2 |C,) = P(C, n C 2 |C,). 

Moreover, from a relative frequency point of view, it would seem 
logically inconsistent if we did not require that the ratio of the 
probabilities of the events C, n C 2 and C,, relative to the space C,, be 
the same as the ratio of the probabilities of these events relative to the 
space 贫 ； that is, we should have 


P(C,nC 2 |C,)_/>(C,nC 2 ) 

AC, |C.) _ P(C,) • 

These three desirable conditions imply that the relation 


P{C 2 \C ,)= 


P{c, n C 2 ) 
" P(Q) 


is a suitable definition of the conditional probability of the event C 2 , 
given the event C,, provided that P(C|) > 0. Moreover, we have 

1. P(C 2 |C,)>0. 

2. PiC 2 u C 3 u * • • |C|) = P(C 2 |C|) + P(C 3 |C|) + …， provided that 
C 2 , C 3 ,... are mutually disjoint sets. 

3 • 戶 ((7,1(：,)= 1. 
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Properties (I) and (3) are evident; proof of property (2) is left as an 
exercise (1.32). But these are precisely the conditions that a probability 
set function must satisfy. Accordingly, P(C 2 \C i ) is a probability set 
function, defined for subsets of C,. It may be called the conditional 
probability set function, relative to the hypothesis C,; or the 
conditional probability set function, given C,. It should be noted that 
this conditional probability set function, given C],is defined at this time 
only when P(C } ) > 0. 

Example 1. A hand of S cards is to be dealt at random without 
replacement from an ordinary deck of 52 playing cards. The conditional 
probability of an all-spade hand (C 2 ), relative to the hypothesis that there 
at least 4 spades in the hand (C,), is, since C 、 r\C 2 = C 2 , 



From the definition of the conditional probability set function, we 
observe that 

P(C ] nC 2 ) = P(C i )P(C 2 \C l ). 

This relation is frequently called the multiplication rule fqr proba¬ 
bilities. Sometimes, after considering the nature of the random 
experiment, it is possible to make reasonable assumptions so that both 
/*(C,)and P{C 2 \C \) can be assigned. Then P{C x r\ C 2 ) can be computed 
under these assumptions. Thik will be illustrated in Examples 2 and 3. 

Example 2. A bowl contains eight chips. Three of the chips are red and 
the remaining five are blue. Two chips are to be drawn successively, at random 
and without replacement. We want to compute the probability that the first 
draw results in a red chip (C,) and that the second draw results in a blue chip 
(C 2 ). It is reasonable to assign the following probabilities: 

^,)=1 and nC 2 |C,)=f. 

Thus, under these assignments, we have P(Ci n C 2 ) = (|)(f) ^ j|. 

Example 3. From an ordinary deck of playing cards, cards are to be 
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drawn successively, at random and without replacement. The probability that 
the third spade appears on the sixth draw is computed as follows. Let C { be 
the event of two spades in the first five draws and let C 2 be the event of a spade 
on the »xth draw. Thus the probability that we wish to compute is P(C, n C 2 ). 
It is reasonable to take 



and P(C 2 |C,) = ^. 


T&e desired probability P(C y r\ C 2 ) is then the product of these two numbers. 

The multiplication rule can be extended to three or more events. In 
the case of three events, we have, by using the multiplication rule for 
two events, 


P(C t nC 2 nC 3 ) = P[(C, n C 2 ) n C 3 ] 


=P(C t n C 2 )PiC 3 \C t n C 2 ). 


But P(C, n C 2 ) = P{C^)P{C 2 \C^ Hence 


P{C, nC 2 nC 3 ) = ^C.mQIC.^IC.nQ). 

This procedure can be used to extend the multiplication rule to four 
or more events. The general formula for k events can be proved by 
mathelmatical induction. 


Example 4. Four cards are to be dealt successively, at random and with¬ 
out replacement, from an ordinary deck of playing cards. The probability 
of receiving a spade, a heart, a diamond, and a club, in that order, is 
( 另 )( 裝 )( 呈 )'( 装 )• Thi s follows from the extension of the multiplication rule. In 
this computation, the assumptions that are involved seem clear. 

Let the space % be partitioned into k mutually, exclusive and 
exhaustive events C,, C 2 ,..., Q such that P{C,) > 0, / = 1,2,..., it. 
Here the events C,, C 2 ,..., C* do not need to be equally likely. Let C 
be another event such that P(C) > 0. Thus C occurs with one and only 
one of the events C lt C 2 ,... ,C k \ that is ， 

C = Cn(C| u C 2 u • ■ ■ u C k ) 

=(C n C t ) u (C n C 2 ) u * * • u (C n C k ). 

Since C n C h i = 1,2,..., k, are mutually exclusive, we have 
P<C) = P(Cn C } ) + P(CnC 2 ) + -- + P(CnC k ). 
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However, P{C n C,) = P{C,)P{C\C) y i = 1 ， 2, ... ， 众 ； so 

P(Q = P{C,)P{C\C,) + P(C 2 )P(C\C 2 ) + •■• + P(C k )P(C\C k ) 

=t P(C,)P(C|C ( ). 

f = I 


This result is sometimes called the law of total probability. 

From the definition of conditional probability, we have, using the 
law of total probability, that 

p, … P(CnCj) P(Cj)P(C\Cj) 


P(C) 


I ncmQQ) 


which is the well-known Bayes’ theorem. This permits us to calculate 
the conditional probability of C y , given C, from the probabilities of 
C lf C 2 ,..., Q and the conditional probabilities of C, given C h 


1 , 2 , 


k. 


Example 5. Say it is known that bowl C, contains 3 red and 7 blue chips 
and bowl C 2 contains 8 red and 2 blue chips. All chips are identical in size and 
shape. A die is cast and bowl C, is selected if five or six spots show on the side 
that is up; otherwise, bowl C 2 is selected. In a notation that is fairly obvious, 
it seems reasonable to assign /»(CV) = | and P(C 2 ) — The selected bowl is 
handed to another person and one chip is taken at random. Say that this chip 
is red, an event which we denote by C. By considering the contents of the 
bowls, it is reasonable to assign the conditional probabilities P(C\ C t ) = ^ and 
P(C|C 2 ) = Thus the conditional probability of bowl C,, given that a red 
chip is drawn, is 


P(Cy\Q = 


P(C,)P(C|C,) 

PiC^PiQC^+PiC^PiQC^ 


( 1 )(^) _ 3 

0(為) + (!)( 吾 ）19 

In a similar manner, we have P(C 2 \C) = ||. 

In Example 5, the probabilities P{C^) = | and P(C 2 ) = \ are called 
prior probabilities of C, and C 2 , respectively, because they are known 
to be due to the random mechanism used to select the bowls. After the 
chip is taken and observed to be red, the conditional probabilities 
P{C X \C) = and P{C 2 \C) = I 5 are called posterior probabilities. Since 
C 2 has a larger proportion of red chips than does C,, it appeals to one’s 
intuition that P(C 2 \C) should be larger than P(C 2 ) and, of course, 
P(C,|C) should be smaller than P(C|). That is, intuitively the 
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chances of having bowl C 2 are better once that a red chip is observed 
than before a chip is taken. Bayes’ theorem provides a method of 
determining exactly what those probabilities are. 


Example 6. Three plants, C t , C 2 , and C 3 , produce respectively, 10,50, and 
40 percent of a company’s output. Although plant C, is a small plant, its 
manager believes in high quality and only 1 percent of its products are 
defective. The other two, C 2 and C 3t are worse and produce items that are 3 
and 4 percent defective, respectively. All products are sent to a central 
warehouse. One item is selected at random and observed to be defective, say 
event C. The conditional probability that it comes from plant C, is found as 


follows. It is natural to assign the respective prior probabilities of getting an 
item from the plants as P(C,) = 0.1, PiC 2 ) = 0.5, and P(C y ) = 0.4, while the 
conditional probabilities of defective are P(C]C,) = 0.01, P(C|C 2 ) = 0.03, and 
P(C|C 3 ) = 0.04. Thus the posterior probability of C !t given a defective, is 


AC.IC) 


P{C { n C) 


(0.10)(0.01) 


P(C) (0.10)(0.01) + (0.50)(0.03) + (0.40)(0.04) ’ 


which equals 士； this is much smaller than the prior probability P(C,) = 
This is as it should be because the fact that the item is defective decreases 
the chances that it comes from the high-quality plant C,. 


Sometimes it happens that the occurrence of event C, does not 
change the probability of event C 2 ; that is, when P(C t ) > 0, 

P(C 2 |C,) = P{C 2 ). 

In this case, we say that the events C, and C 2 are independent. Moreover, 
the multiplication rule becomes 


尸 (C_ n C 2 ) = P{C,)P(C 2 \C X ) = P(C t )P(C 2 ). 
This, in turn, implies, when P(C 2 ) > 0, that 

AC, n C 2 ) 尸 (C,) 尸 (C 2 ) 


AC,|C 2 ) 


P(C 2 ) 


P(C 2 ) 


P(Q). 


Remark. Events that are independent are sometimes called statistically 
independent，stochastically independent, or independent in a probability sense. 
In most instances, we use independent without a modifier if there is no 
possibility of misunderstanding. 

It is interesting to note that C, and C 2 are independent if P(C,) = 0 
or P(C 2 ) = 0 because then P(C, n C 2 ) = 0 since (C, n C 2 ) C, and 
(C| n C 2 ) <= C 2 . Thus the left- and right-hand members of 

P(C, n C 2 ) = P(C,)AC 2 ) 
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are both equal to zero and are, of course, equal to each other. Also, 
if Ci and C 2 are independent events, so are the three pairs: C, and Cf, 
C* and C 2 , and C* and C* (see Exercise 1.41). 

Example 7. A red die and a white die are cast in such a way that the 
number of spots on the two sides that are up are independent events. If C x 
represents a four on the red die and C 2 represents a three on the white die, 
with an equally likely assumpticm for each side, we assign PiC,) = 5 and 
P(C 2 ) = Thus, from independence, the probability of the ordered pair 
(red = 4, white = 3) is 

巧 (4, 3 )]= ①① = 务 

The probability that the sum of the up spots of the two dice equals seven is 
巧 (1 ，6 ) ， (2, 5) ， (3,4) ， (4, 3) ，（ 5, 2) ， (6, 1)] 

= ① ㊉ + a ) a ) +⑽ +( i )( i ) +⑽ + ⑽ * 壺. 

In a similar manner, it is easy to show that the probabilities of the sums of 
2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1， 12 are ， respectively, 

12345654 3 21 

55, 35,38, 5S ， 55,15, 55, 55, 55, 3? ， 5?. 

Suppose now that we have three events, C,, C 2 , and C 3 . We say 
that they are mutually independent if and only if they are pairwise 
independent: 

P{C v n C 3 ) = PCC 1 )P(C 3 ), P{C X n C 2 ) = P{C^P{C,\ 

PiC 2 n C 3 ) = P(C 2 )P(C 3 ) 
and 

P{C X nC 2 nC 3 ) = P(C,)P(C 2 )/>(C 3 ). 

More generally, the n events C,, C 2 ,..., C„ are mutually independent 
if and only if for every collection of k of these events, 2<,k<n, the 
following is true: 

Say that d lf d 2 ,... ,d k arek distinct integers from 1,2,...,/*; then 
P(C di nQn---nC A )-)/>(( ：々 ) … 尸 (C 办 ). 

In particular, if C,, C 2 ,..., C„ are mutually independent, then 
P{C, nC 2 n---nC„) = P(Q )P(C 2 ) - - - P{C n ). 
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Also, as with two sets, many combinations of these events and their 
complements are independent, such as 

C* and (C 2 uC* u C 4 ) are independent; 

C, (j C?, Cf, and C 4 n Cf are mutually independent. 

If there is no possibility of misunderstanding, independent is often used 
without the modifier mutually when considering more than two events. 

We often perform a sequence of random experiments in such a way 
that the events associated with one of them are independent of the 
events associated with the others. For convenience, we refer to these 
events as independent experiments, meaning that the respective events 
are independent. Thus we often refer to independent flips of a coin or 
independent casts of a die or~more generally — independent trials of 
some given random experiment. 

pxample8. A coin is flipped independently several times. Let the event C, 
represent a head (H) on the /th toss; thus C 1 ? represents a tail (T). Assume that 
Ci and C*i are equally likely; that is, P(C,) = P{C^) = 5 . Thus the probability 
of an ordered sequence like HHTH is, from independence, 

P(C { nC 2 nQnC 4 ) = />(C,)/>(C 2 )P(C?)P(C 4 ) - (i ) 4 = 

Similarly, the probability of observing the first head on the third flip is 
P(C^ nC?nC 3 ) = />(Cf )/>(C?)/>(C 3 ) - ({) 3 = 

Also, the probability of getting at least one head on four flips is 
P{C\ u C 2 u C 3 u C 4 ) = 1 — P[(C X u C 2 u C 3 u C 4 )*] 

=1 - i»(CTnC?nC?nCJ) 

=1 — (i) 4 = u- 

See Exercise 1.43 to justify this last probability. 


EXERCISES 

1.32. If PCC,) > 0 and if C 2 , C 3 , C 4 ,... are mutually disjoint sets, show that 
P(C 2 uC 3 u---|C,) = P{C 2 \C y ) + 户 (C 3 |C,) + • •. • 

1.33. Prove that 

PCC, nC 2 nC 3 nC 4 ) = P{C,)P{Cj\C,)P(C 3 \C t n C 2 )P(C A \C { nC 2 n C 3 ). 

1.34. A bowl contains 8 chips. Three of the chips are red and 5 are blue. Four 
chips are to be drawn successively at random and without replacement, 
(a) Compute the probability that the colors alternate. 
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(b) Compute the probability that the first blue chip appears on the third 
draw. 

1.35. A hand of 13 cards is to be dealt at random and without replacement 
from an ordinary deck of playing cards. Find the conditional probability 
that there are at least three kings in the hand relative to the hypothesis that 
the hand contains at least two kings. 

1.36. A drawer contains eight pairs of socks. If six socks are taken at random 
and without replacement, compute the probability that there is at least one 
matching pair among these six socks. 

Hint: Compute the probability that there is not a matching pair. 

1.37. A bowl contains 10 chips. Four of the chips are red, 5 are white, and 

1 is blue. If 3 chips are taken at random and without replacement, compute 
the conditional probability that there is 1 chip of each color relative to the 
hypothesis that there is exactly 1 red chip among the 3. 

1.38. Bowl I contains 3 red chips and ^7.blue chips. Bowl 11 contains 6 red chips 
and 4 blue chips. A bowl is selected at random and then 1 chip is drawn 
from this bowl. 

(a) Compute the probability that this chip is red. 

(b) Relative to the hypothesis that the chip is red, find the conditional 
probability that it is drawn from bowl II. 

1.39. Bowl I contains 6 red chips and 4 blue chips. Five of these 10 chips are 
selected at random and without replacement and put in bowl II, which was 
originally empty. One chip is then drawn at random from bowl II. Relative 
to the hypothesis that this chip is blue, find the conditional probability that 

2 red chips and 3 blue chips are transferred from bowl I to bowl II. 

1.40. A professor of statistics has two boxes of computer disks: box C| 
contains seven Verbatim disks and three Control Data disks and box C 2 
contains two Verbatim disks and eight Control Data disks. She selects a box 
at random with probabilities /*(C|) = 5 and P(C 2 ) = } because of their 
respective locations. A disk is then selected at random and the event C 
occurs if it is from Control Data. Using an equally likely assumption for 

each disk in the selected box, compute /^CJO and P(C 2 \C). 

- K , * 

1.41. If C, and C 2 are independent events, show that the following pairs of 
events are also independent: (a) C, and C?, (b) and C 2 , and (c) C* and 
Cf. 

2 In (a), write ^(C.nC?) = P{C X )P{C* 2 \C,) = P(C,)[\ - P(C 2 |C,)]. 
From independence of C, and C 2 , f*(C 2 |C|) = / > (C 2 ). 

1.42. LetC t and C 2 be independent events with P(C^) = 0.6 and / > (C 2 ) = 0.3. 
Compute (a) P(C { n C 2 ); (b) P(C t u C 2 ); (c) P(C, uC?). 
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IA3. Generalize Exercise 1.4 to obtain 

(C, u C 2 u ■ • ■ u C*)* = C1[ nC^ n - • ■ nC^. 

Say that C t , C 2 ,..., C k art' independent events that have respective 
probabilities p t ,p 2 , …， p*. Argue that the probability of at least one of 
C,, C 2 ,..., C* is equal to 

1-(1 -p,)(l 一 p 2 ) … （1 - p k ). 

1.44. Each of four persons fires one shot at a target. Let C t denote the event 
that the target is hit by person k, k = 1,2, 3,4. If C,, C 2 , Cj, C 4 are 
independent and if P(C,) = P(C 2 ) = OJ, P{C } ) = 0.9, and f\C 4 ) = 0.4, 
compute the probability that (a) all of them hit the target; (b) exactly one 
hits the target; (c) no one hits the target; (d) at least one hits the target. 

1.45. A bowl contains three red (R) balls and seven white (W) balls of exactly 
the same size and shape. Select balls successively at random and with 
replacement so that the events of white on the first trial, white on the second, 
and so on，can be assumed to be independent. In four trials, make certain 
assumptions and compute -the probabiUties of the following ordered 
sequences: (a) WWRW; (b) RWWW; (c) WWWR; and (d) WRWW. 
(e) Compute the probability of exactly one red ball in the four trials. 

1.46. A coin is tossed two independent times, each resulting in a tail (T) or 
a head (H). The sample space consists of four ordered pairs: TT, TH, HT, 
HH. Making certain assumptions, compute the probability of each of these 
ordered pairs. What is the probability of at least one head? 

1.5 Random Variables of the Discrete Type 

The reader will perceive that a sample space ^ may be tedious to 
describe if the elements of ^ are not numbers. We shall now discuss 
how We may formulate a rale, or a set of rules, by which the elements 
c of ^ may be represented by numbers. We begin the discussion with 
a very simple example. Let the randotn' experiment the toss of a coin 
and let the sample space associatecj , with the experiment be 
^ = {c\ where c is T or r is H} and T and H represent, respectively, 
tails and heads. Let ^ be a function such that ^(r) = 0 if c is T and 
let AXf) = 1 if r is H. Thus X is a real-valued function defined on the 
sample space 贫 which takes us from the sample space ^ to a space of 
real numbers — {0, 1}. We call X a random variable and, in this 
example, the space associated with A"is i = {0, 1 }• We now formulate 
the definition of a random variable and its space. 

Definition 8. Consider a random experiment with a sample 
space 贫 • A function X, which assigns to each element c g ^ one and 
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only one real number X(c) = x, is called a random variable. The space 

of X is the set of real numbers s/ = {x : x = X(c), c e ^}. 

* . . 

It may be that the set 贫 has elements which are themselves 
real numbers. In such an instance we could write X(c) = c so that 

Let JTbe a random variable that is defined on a sample space 
and let bje the space of X. Further, let ^4 be a subset of sd. Just as 
we used the terminology “the event C,” with C c 贫， we shall now 
speak of “the event A. f, The probability P(C) of the event C has been 
defined. We wish now to define the probability of the event A. This 
probability will be denoted by Pr (X s A), where Pr is an abbreviation 
for “the probability that.” With A a subset of let C be that subset 
of 贫 such that C = {c:cg^ and X(c) e A). Thus C has as its element 总 
all outcomes in % for which the random variable X has a value that 
is in A. This prompts us to define，as we now do, Pr (X e yi) to be equal 
to P(C), where C = {c : c e ^ and X(c) eA}. Thus Pr (Xe A) is an 
assignment of probability to a set A, which is a subset of the space 
associated with the random variable X. This assignment is determined 
by the probability set function 尸 and the random variable X and is 
sometimes denoted by 尸 jr ⑷. That is, 

Pr (U)= 心⑷ = 肌 

where C = {c ice^ and X(c) e A). Thus a random variable I is a 
function that carries the probability from a sample space ^ to a space 
si of real numbers. In this sense, with A s^, the probability P X {A) 
is often called an induced probability. 

Remark, tn a more advanced course, it would be noted that the random 
variable A" is a Borel measurable function. This is needed to assure that we 

♦ , I a* ■ 

can find the induced probabilities on the sigma field of the subsets of si. We 
need this requirement throughout this book for every function that is a 
random variable, but no further mention of it is made. 

The function P X {A) satisfies the conditions (a), (b), and (c) of the 
definition of a probability set function (Section 1.3). That is, P X {A) is 
also a probability set function. Conditions (a) and (c) are easily verified 
by observing, for an appropriate C, that 

Px{A) = P{C) > 0 , 

and that ^ = {c : cs^ and X(c) e js/} requires 

P^) = P(^)= 1. 
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In discussing the condition (b), let us restrict our attention to the two 
mutually exclusive events A t and A 2 . Here P x {^\ ^ ^2)= 尸 (O, where 
C — {c \c and X{c) e A 2 }. However, 

C = {c :ce^ and X(c) e A,} {c : c e ^ and X(c) e A 2 ), 

or, for brevity, C = C, u C 2 . But C, and C 2 are disjoint sets. This must 
be so, for if some c were common, say c„ then X(c t ) e and X{c) e A 2 . 
That is, the same number X(Ci) belongs to both A, and A 2 . This is a 
contradiction because A, and A 2 are disjoint sets. Accordingly, 

P(C) = P(C,) + P(C 2 ). 

However, by definition, P{C\) is P x {^\) and P(C 2 ) is /^( 為 ） and thus 
P x (A^A 2 ) = P x (A^ + Px(A 2 ). 

This is condition (b) for two disjoint sets. 

Thus each of P X (A) and P{C) is a probability set function. But the 
reader should fully recognize that the probability set function P is 
defined for subsets C of 贫 ， whereas is defined for subsets A of si 、 
and, in general, they are not the same set function. Nevertheless, they 
are closely related and some authors even drop the index X and write 
P(A) for P X (A). They think it is quite clear that P(A) means the 
probability of A, a subset of s/, and P{C) means the probability of C, 
a subset of 贫 . From this point on, we sh^U adopt this convention and 
simply write P{A). 

Perhaps an additional example will be helpful. Let a coin be 
tossed two independent times and let our interest be in the number 
of heads to be observed. Thus the sample space is 劣 ={c : where c is 
TT or TH or HT or HH}. Let X{c) = 0 if c is TT; let X(c) = 1 if c 
is cither TH or HT; and let X(c) = 2 if c is HH. Thus the space of 
the random variable Z is j = {0, 1 ， 2}. Consider the subset A of the 
space sd 、 where A = {1}. How is the probability of the event A 
defined? We take the subset C of ^ to have as its elements all 
outcomes in ^ for which the random variable X has a value that is an 
element of A. Because X(c) = 1 if c is either TH or HT, then 
C={c: where c is TH or HTj. Thus P{A) = Pr (XeA) = P(Q. Since 
A = {1}, then P(A) = Pr (Jfe A) can be written more simply as 
Pr(Y= l).LetC, = {c : cisTT}, C 2 = {c : cisTH}, C 3 = {c:cisHT}, 
and C 4 = {c : c is HH} denote subsets of 贫 . From independence and 
equally likely assumptions (see Exercise 1.46 )， our probability set 
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This table depicts the distribution of probability over the elements 
of j/, the space of the random variable X. This can be written more 
simply as 

Pr(I = x) 《 ) 0 ) ， xe^. 

Example 1. Consider a sequence of independent flips of a coin, each 
resulting in a head (H) or a tail (T). Moreover, on each flip, we assume that 
H and T are equally likely, that is , 尸 (H) = /*(T) = The sample space ^ 
consists of sequences like TTHTHHT • • •. Let the random variable X equal 
the number of flips needed to obtain the first head. For this given sequence, 
X = 3. Clearly, the space of Jfis = { 】， 2, 3,4, • •. }. We see that A" = 1 when 
the sequence begins with an H and thus Pr {X = 1) = 5 . Likewise, X =2 when 
the sequence begins with TH, which has probability Pr (X = 2) (5)(5) = 5 

from the independence. More generally, if A" = x, where x = 1 ， 2, 3, 4 , ... ， 
there must be a string of jc — 1 tails followed by a head, that is, TT ••- TH, 
where there are jc — 1 tails in TT • • • T. Thus, from independence，we have 

\\) = {\) X ' v=1 > 2 ' 3 . 

Let us make some observations about these three illustrations of 
a random variable. In each case the number of points in the space si 
was finite, as with { 0 , 1 } and { 0 , 1 , 2 }, or countable, as with 
{I, 2, 3,...}. There was a function, say j{x) = Pr (X = x), that 
described how the probability is distributed over the space . In each 



function P(C) assigns a probability of ^ to each of the sets C„ 
1,2, 3,4. Then P(C t ) = \, P{C 2 u C 3 ) = i + J = i, and P(C 4 )= 
Let us now point out how much simpler it is to couch these statements 
in a language that involves the random variable X. Because X is the 
number of heads to be observed in tossing a coin two times, we have 

Pr (X = 0) = since P(C,) = 

?r(X= l) = i, since / >(C 2 u C 3 ) = ^ 

and 

Pr (X = 2) = I ， since P{C^) = 

This may be further condensed in the following table: 
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of these illustrations, there is a simple formula (although that is not 
necessary in general) for that function, namely: 

Ax) = xe{0,l }， 


and 



■xe {0, 1 ， 2 }， 


jc 6 {1, 2, 3,.. 


Moreover, the sum of J[x) over all x 6 s/ equals 1: 




ir2 + i =h 



■7 + X + T = 1 ， 


1 . 


Finally, if A cz s/, we can compute the probability of A" e ^4 by the 
summation 

Pr(XeA) = Y J Ax). 

/ A 

For illustrations, using the random variable of Example 1, 
Pr(JT=l,2, 3)= + \ + l = l 


and 


Pr (X= 1,3,5, .) 


2 


關 


+ 


We have special names for this type of random variable X and for a 
function j{x) like that in each of these three illustrations, which we 
now give. 

Let X denote a random variable with a one-dimensional space 
Suppose that consists of a countable number of points; that is, s/ 
contains a finite number of points or the points of ^ can be put into 
a one-to-one correspondence with the positive integers. Such a space 
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«j^ is called a discrete set of points. Let a function /(a:) be such that 
J{x) >0, xej 2 /, and 

S K x ) = ^ - 

Whenever a probability set function P{A), A cz can be expressed 
in terms of such an J{x) by 


P(A) = Pr(XeA) = Y d Ax), 
u _. A 
then X is called a random variable of the discrete type and./(^) is called 

the probability density function of X. Hereafter the probability density 
function is abbreviated p.d.f. 

Our notation can be simplified somewhat so that we do not need 
to spell out the space in each instance. For illustration, let the random 
variable be the number of flips necessary to obtain the first head. We 
now extend the definition of the p.d.f. from on js / = {1,2, 3,...} to 
all the real numbers by writing 1 


Ax) = 


釕， 


1 y ^ ^• 4 •赘 


= 0 elsewhere. 


From such a function, we see that the space is clearly the set of 
positive integers which is a discrete set of points. Thus the 
corresponding random variable is one of the discrete type. 


Example 2. A lot, consisting of 100 fuses, is inspected by the following 
procedure. Five of these fuses are chosen at random and tested; if all 5 “blow” 
at the correct amperage, the lot is accepted. If, in fact, there are 20 defective 
fuses in the lot, the probability of accepting the lot is, under appropriate 


assumptions, 


0 

⑺ 


= 0.32, 


approximately. More generally, let the random variable X be the number of 
defective fuses among the 5 that are inspected. The p.d.f. of X is given by 



A x ) = Pr (I = x) = -----, x = 0, 1, 2, 3,4, 5, 

⑺ 


= 0 elsewhere. 
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Clearly, the space of Jf is = {0, 1, 2, 3, 4, 5}. Thus this is an example of 
a random variable of the discrete type whose distribution is an illustration of 
a hypergeometric distribution. 

Let the random variable X have the probability set function P(A), 
where is a one-dimensional set. Take x to be a real number 
and consider the set A which is an unbounded set from —oo to x, 
including the point jc itself. For all such sets A we have 
P(A) = Fr(Xe A) = Pr (X < x). This probability depends on the 
point x; that is, this probability is a function of the point x. This point 
function is denoted by the symbol = Pr {X <, x：). The function 
尸 (jc) is called the distribution function (sometimes, cumulative 
distribution function) of the ratidom variable X. Since 
F(x) = Pr jc), then, with /(jc) the p.d.f., we have 

m= I /(ho, 

for the discrete type. 

Example 3. Let the random variable X of the discrete type have the p.d.f. 
fix) = x/6, x = 1, 2, 3, zero elsewhere. The distribution function of X is 

F{x) = 0 ， x < 1 ， 

= 1 ^ X < 2, 

= 2 < x < 3, 

= 1 ， 3 ^ x. 

Here, as depicted in Figure 1.3, /1(^) is a step function that is constant in every 
interval not containing 1, 2, or 3, but has steps of heights 去 ，！， and which 
are the probabilities at those respective points. It is also seen that i^jc) is 
everywhere continuous from the right. The p.d.f. of X is displayed as a bar 


F(jr) 



I 2 3 


FIGURE 1.3 
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graph in Figure 1.4. We see that f{x) represents the probability at each x 
while F{x) cumulates all the probability of points that are less than or equal 
to x. Thus we can compute a probability like 

Pr(1.5 < JIT < 4.5) = F\4.5) - F(\.5) = 1 - 卜暴 

or as 

Pr (1.5 < IS 4.5) = 八 2) +/(3) = i +1 = |. 

While the properties of a distribution function F(x) = Pr (A" ^ x) are 
discussed in more detail in Section 1.7, we can make a few observations 
now since F(x) is a probability. 

1. 0 < F(x) < 1. 

2 . fl[x) is a nondecreasing function as it cumulates probability as x 
increases. 

3. F(y) = 0 for every point y that is less than the smallest value irt the 
space of X. 

4- F{z) = l for every point z that is greater than the largest value in 
the space of JT. 

5. If A" is a random variable of the discrete type, then /l(x) is a step 
function and the height of the step at x in the space of X is equal 
to the probability f(x) = Pr (Jf = x). 


EXERCISES 

1.47. Let a card be selected from an ordinary deck of playing cards. The 
outcome c is one of these 52 cards. Let X(c) = 4 if c is an ace, let X(c) = 3 
if c is a king, let X(c) = 2 if c is a queen, let X(c) = 1 if c is a jack, and 
let X{c) = 0 otherwise. Suppose that P assigns a probability of ^ to 


Jr 



6-6 5^6 4 -fi 3-6 2-6 1-6 
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each outcome c. Describe the induced probability P X {A) on the space 
si = {0, 1 ， 2, 3, 4} of the random variable X. 

1.48. For each of the following，find the constant c so that/(x) satisfies the 
condition of being a p.d.f. of one random variable X. 

(a) y(x) = 尸 ， x = 1 ， 2, 3,…， zero elsewhere. 

(b) y(x) = cx, x = 1,2,3, 4, 5, 6, zero elsewhere. 

1.49. Let J{x) = x/15, x = 1, 2, 3, 4, 5, zero elsewhere, be the p.d.f. of X. 
Find Pr (^ = 1 or 2), Pr (^ < ^ < |), and Pr (1 < JT < 2). 

1.50. Let /(x) be the p.d.f. of a random variable X. Find the distribution 
function F(x) of X and sketch its graph along with that of/(x) if: 

(a) Ax) =1, x = 0, zero elsewhere. 

(b) fix) = I，x = — 1,0, 1, zero elsewhere. 

(c) J{x) = x/\5, x = 1, 2, 3, 4, 5, zero elsewhere. 

1.51. Let us select five cards at random and without replacement from an 
ordinary deck of playing cards. 

(a) Find the p.d.f. of X, the number of hearts in the five cards. 

(b) Determine Pr 1). 

1.52. Let X equal the number of heads in four independent flips of a coin. 
Using certain assumptions, determine the p.d.f. of X and compute the 
probability that X is equal to an odd number. 

1.53. Let X have the p.d.f. J{x) = x/5050, x = 1, 2, 3,..., 100, zero 
elsewhere. 

(a) Compute Pr (X < 50). 

(b) Show that the distribution function of X is i^x) = [x]([x] + 1)/10100, 
for 1 < x < 100, where [x] is the greatest integer in x. 

1.54. Let a bowl contain 10 chips of the same size and shape. One and only 
one of these chips is red. Continue to draw chips from the bowl, one at a 
time and at random and without replacement, until the red chip is drawn. 

(a) Find the p.d.f. of A", the number of trials needed to draw the red chip. 

(b) Compute Pr (X ^ 4). 

1.55. Cast a die a number of independent times until a six appears on the up 
side of the die. 

(a) Find the p.d.f. J{x) of X, the number of casts needed to obtain that 
first six. 

(b) Show that J{x) = 1. 

X = I 

(c) Determine Pr (I = 1 ， 3, 5, 7, . •. 

(d) Find the distribution function = Pr (A" ^ x). 
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1.56. Cast a die two indepeAdent times and let X equal the absolute value of 
the difference of the two resulting values (the numbers on the up sides). Find 
the p.d.f. of X. 

Hint: It is not necessary to find a formula for the p.d.f. 


1.6 Random Variables of the Continuous Type 

A random variable was defined in Section 1.5, and only those of the 
discrete type were considered there. Let us begin the discussion of 
random variables of the continuous type with an example. 

Let a random experiment be a selection of a point that is interior 
to a circle of radius 1 that has center at the origin of a two-dimensional 
space. We call this space ^ and the area of this circle is n. The random 
selection is in such a way that the probability of being in a certain set 
C interior to % is proportional to the area of C; in particular, if C c 劣， 

AC) = area^fC , 

71 


First we observe that P(^) = 1. In addition, if C x is that subset of 
贫 that is in the first quadrant, P{C\) = (n/4)/n = If C 2 is the interior 
of a circle of radius | such that C 2( c then P(C 2 ) = n(^) 2 /n = |. It is 
interesting to note that the probability of a point, a line segment, or 
any curve in 贫 is equal to zero because those areas would be zero. In 
particular, if C 3 is the boundary of the set C 2 (that is, C 3 is the actual 
circle of radiys j), then P(C 3 ) = 0. 

We define a random variable X, associated with this random 
experiment, as the distance of the selected point from the origin. The 
space of X is ja?" = {x: 0 < jc < 1}. Of course, for any xes^, 
Pr (^ = jc) = 0, because X= x is the event that the random point falls 
on a circle, symmetric with respect to the origin, of radius jc and the 
associated area equals zero. However, it does make sense to consider 
the induced probability of the event ^ < jc, namely the distribution 
function of A". If jc £ s/, then 


F(x) = ?r(X<x) = 


area of a certain circle of radius x 
n 


nx^ 

n 



0 < jc < 1. 



Clearly, if ^ < 0, then f(jc) = 0; and if x > 1 , then /(x) — 1. Thus we 
can write 


= 0, x < 0, 

= x 2 , 0 < x < 1 , 

=1, 1 < X. 

Recall, in the discrete case, we had a function / that was associated 
with F through the equation 

Hx)= I Aw). 

w 


Either F or /could be used to compute probabilities like 


Pr 如 <X<b) = F{b) -F(a)=l 7M ， 

wg A 


where A = {w : a <w < A}. W 6 have observed, in this continuous case, 
that Pr (A" = jc) = 0, so a summation of such probabilities is no longer 
appropriate. However, it is easy to find an integral that relates Ftof 
through 


Hx) 



A^) dw. 


Since = {jc : 0 < x < 1 }, this can be written as 

F{x) = x 2 = J{w) dw, x e 
^0 


By one form of the fundamental theorem of calculus, we know that the 
derivative of the right-hand member of this equation is J{x). Thus 
taking derivatives of each member of the equation, we obtain 

2x =J[x), 0 < x < 1 

Of course, at x = 0, this is only a right-hand derivative. We observe 
that_/(jc) . 之 0 , xe W，and 

2xdx = 1 . 

Jo 

Probabilities can now be computed through 

Pr(Jire^)= f f(w) dw. 
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For illustration, 




1/2 

1/4 



With the background of this example, we give the definition of a 
random variable of the continuous type. 

Let X denote a random variable with a one-dimensional space j/, 
which consists of an interval or a union of intervals. Let a function f(x) 
be nonnegative such that 



f(x) dx = 



Whenever a probability set function P{A\ A cz can be expressed 
in terms of such an f(x) by 

r f% 

P(X) = Pr (XeA)= f(x) dx. 


then X is said to be a random variable of the continuous type and f(x) 
is called the probability density function (p.d.f.) of X. 

Example 1. Let the random variable of the continuous type X equal the 
distance in feet between bad records of a used computer tape. Say that the 
space of I is j = {x: 0 < jc < oo}: Suppose that a reasonable probability 
model for X is given by the p.d.f. 

/W = 去 e~ xm , xe«s/. 

Here f(x) ^ 0 for jc e and 


*QO 

去 e~ xl4 ° dx = [W" 40 ]: = 1. 

Jo 

If we are interested in the probability that the distance between bad records 
is greater than 40 feet, then ^ = {x: 40 < x < oo} and 


Pr (XeA )= 

The p.d.f. and the probability of interest are depicted in Figure 1.5. 

If we restrict ourselves to random variables of either the discrete 
type or the continuous type, we may work exclusively with the p.d.f. 
f(x). This affords an enormous simplification; but it should be 
recognized that this simplification is obtained at considerable cost from 
a mathematical point of view. Not only shall we exclude from 
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f(x) 



consideration many random variables that do not have these types of 
distributions, but we shall also exclude many interesting subsets of the 
space. In this book, however, we shall in general restrict ourselves to 
these simple types of random variables. 

Remarks. Let X denote the number of spots that show when a die is cast. 
We can assume that I is a random variable with js/ = {1 ， 2, ... ， 6} and with 
a p.d.f.y^x) = g, xe s/. Other assumptions can be made to provide different 
mathematical models for this experiment. Experimental evidence can be used 
to help one decide which model is the more realistic. Next, let X denote the 
point at which a balanced pointer comes to rest. If the circumference is 
graduated 0<x< 1, a reasonable mathematical model for this experiment is 
to take A"to be a random variable with j^ = {x : 0^x< 1} and with a p.d.f. 
fix') =\, xes/. 

Both types of probability density functions can be used as distri¬ 
butional models for many random variables found in real situations. For 
illustrations consider the following. If X is the number of automobile acci¬ 
dents during a given day, theny(0),/(l),y(2),... represent the probabilities 
of 0, 1, 2,... accidents. On the other hand, if X is length of life of a female 
born in a certain community, the integral [area under the graph ofy(x) that 
lies above the x-axis and between the vertical lines x = 40 and x = 50] 

广 50 

Ax)dx 

represents the probability that she dies between 40 and 50 (or the percentage 
of those females dying between 40 and 50). A particular _/(x) will be suggests! 
later for each of these situations, but again experimental evidence must be used 
to decide whether we have realistic models. 
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Our notation can be considerably simplified when we restrict 
ourselves to random variables of the continuous or discrete types. 
Suppose that the space of a continuous type of random variable X is 
s/ = {x \0 < x < oo} and that the p.d.f. of X is e— x , xes^. We shall 
in no manner alter the distribution of Jf[that is, alter any P(A), A <= s/] 

if we extend the definition of the p.d.f. of X by writing 

- ■ 

f{x) — e~ x , 0 < x < oo,. 

= 0 elsewhere, 

and then refer to J\x) as the p.d.f. of X. We have 

* • 


Ax) dx 


0 dx -\- 


e— x dx 


• < 

Thus we may treat the entire axis of reals as though it were the space 
of X. Accordingly, we now replace 


Ax)dx by J[x) dx. 

J —CO 

If^(jc) is the p.d.f. of a continuous type of random variable X and 
if is the set {x : a < x < b), then P{A) = Pr (X e A) can be written 
as 

，b 

Pr (a < X < b) = J[x) dx. 

Moreover, if A = {a}, then 

杨 a 

P{A) = Pr(Ze^) = Pr (X = a) = J[x) dx = 0, 

Jo 

since the integral jaf[x) dx is defined in calculus to be zero. That is, if 
^ is a random variable of the continuous type, the probability of every 
set consisting of a single point is zero. This fact enables us to write, say, 

Pr(a<X<b) = Pr(a <, X < b). 

More important, this fact allows us to change the value of the p.d.f. 
of a continuous type of random variable X at a single point without 
altering the distribution of X. For instance, the p.d.f. 

fix) = e~ x , 0 < x < oo, 

= 0 elsewhere, 
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can be written as 

fix) = e~ x , 0 <,x < co, 

= 0 ， elsewhere, 

without changing any P(A). We observe that these two functions differ 
only at x = 0 and Pr (Jf = 0) = 0. More generally, if two probability 
density functions of random variables of the continuous type differ 
only on a set having probability zero, the two corresponding 
probability set functions are exactly the same. Unlike the continuous 
type, the p.d.f. of a discrete type of random variable may not be 
changed at any point, since a change in such a p.d.f. alters the 
distribution of probability. 

Example 2. Let the random variable X of the continuous type have 
the p.d.f. J{x) = 2/jc 3 , 1 < jt < oo, zero elsewhere. The distribution function 
of X is 

fx 

F(x) = Odw — 0^ x < l y 

•^一 00 

^ rfw = 1 - p, 1 ^ X. 

The graph of this distribution function is depicted in Figure 1.6. Here /(x) is a 
continuous function for all real numbers x;,in particular, /(x) is everywhere 
continuous from the right. Moreover, the derivative of /n(x) with respect to 
x exists at all points except at x = 1. Thus the p.d.f. of X is defined by this 
derivative except atx = 1. Since the set = {1} is a set of probability measure 
zero [that is, P{A) = 0], we are free to define the p.d.f. at jc = 1 in any manner 
we please. One way to do this is to write / [x) = 2/x i , 1 < x < oo, zero 
elsewhere. 




FIGURE 1.6 
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EXERCISES 

137. Let a point be selected from the sample space ^ = {c : 0 < c < 10}. Let 
C a% and let the probability set function be P(C) = J c ^ Define the 
random variable A" to be X(c) = c 2 . Find the distribution function and the 
p.d.f. of X. , 

1.58. Let the probability set function P(A) of the random variable X be 
P{A) = j A f{x) dx, where f(x) = 2x/9, x e W = {jc : 0 < x < 3}. Let 
/l! = {x : 0 < a: < 1}, /1 2 = {at : 2 < jc < 3}. Compute P{A^) = Pr [A"e /!,], 
P(A 2 ) = Pr(Jir6/4 2 ), and/»(/<, u/ 4 2 ) = Pr (JITe/4, u/4 2 ). 

1.59. Let the space of the random variable X be = {x: 0 < jc < 1}. If 
/i, = {x : 0 < x < 5 } and /1 2 = {jc : 5 < x < 1}, find P{A 2 ) if P{A\) = 

1.60. Let the space of the random variable A" be = {x : 0 < x < 10} and 
let P{A\) = I，where /l, = {x: 1 < jc < 5}. Show that P{A 2 ) ^ where 
/4 2 = {x : 5 < x < 10}. 

1.61. Let the subsets /l, = {x : i < jc < and ^2 = {^ : 5 < x < 1} of the 
space = (x : 0 < x < 1} of the random variable X be such that P(A{) — | 
and P(A 2 y= ^ Find u/< 2 ), P(Af), and /\A^nAf). 

1.62. Given + x 2 )]</x, where A cz s/ = {x : — co < x < oo}. Show 

that the integral could serve as a probability set function of a random 
variable X whose space is s/. 

1.63. Let the probability set function of the random variable X be 

* 

P{A) == e~ x dx, where W = -{x : 0 < x < oo}. . 

Let A k = {x\2 — \ jk < x <y\, k = 1, 2, 3. Find lim A k and 

P ^lim . * 

Find P{A k ) and lim P(A k ). Note that lim P(A k ) = P \ lim A k 

1.64. For each of the following probability density functions of X, compute 
Pr(|Jir| < 1) and Pr (JT 2 < 9). 

(a) f(x) = x 2 /18, — 3 < x < 3, zero elsewhere^ 

(b) f{x) = (jc + 2)/18, - 2 < x < 4, zero elsewhere. 

1.65. Let /(x) = 1/jr 2 , 1 < jc < 00 , zero elsewhere, be the p.d.f. of X. 
If = {x: 1 < jc < 2} and A 2 = {x : 4 < x < 5}, find P{A X u A 7 ) and 
P{A\ n A 2 )- 

1.66. A mode of a distribution of one random variable X is a value of x that 
maximizes the p.d.f. /(jc). For X of the continuous type, /(jc) must be 
continuous. If there is only one such x, it is called the mode of the 
distribution. Find the mode of each of the following distributions: 

(a) /(jc) = ( 姜 )' x = 1, 2, 3,. .., zero elsewhere. 
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(b) J{x) = 12^(1 一 X )， 0 < x < 1, zero elsewhere. 

(c) J{x) = ({)x 2 e~ x , 0 < x < oo, zero elsewhere. 

1.67. A median of a distribution of one random variable X of the 


discrete or continuous type is a value of x such that Pr (Jf < x) < ^ 
and If there is only one such x, it is called the 

median of the distribution. Find the median of each of the following 


distributions: 

4! (\ 

x! (4 - x)\ 14 


⑷ yu) 


1^1 ， x = 0, 1 ， 2, 3,4, zero elsewhere. 

3X 2 , 0 < x < 1, zero elsewhere. 


— oo<x<oo. 


(b) 刺 

(C) Ax) ~ + x 2 ) ’ 

Hint: In parts (b) and (c), Pr (Jt < x) = Pr (X < x) and thus that 
common value must equal 5 if jc is to be the median of the distribution. 


1.68. Let 0 < p < 1. A (100p)th percentile (quantile of order p) of the 
distribution of a random variable A" is a value ^ such that Pr (X < ^ p ) < p 
and Pr (X <; ^ p ) > p. Find the twentieth percentile of the distribution that 
has p.d.f. j{x) = 4x 3 , 0 < x < 1, zero elsewhere. 

Hint: With a continuous-type random variable X, Pr (X < ^ p )= 
Pr (X <《 p ) and hence that common value must equal p. 


1.69. Find the distribution function F(x) associated with each of the follow¬ 
ing probability density functions. Sketch the graphs of f(x) and 
⑻ /W = 3(1 — x) 2 , 0 < x < 1, zero elsewhere. 

(b) f(x) = \jx l , 1 .< x < oo, zero elsewhere. 

(c) f(x) = 5 , 0 < x < 1 or 2 < jc < 4, zero elsewhere. 

Also find the median and 25th percentile of each of these distributions. 


1.70. Consider the distribution function F(x) = 1 — e~ x — xe~ x i 0 < x < oo, 
zero elsewhere. Find the p.d.f., the mode, and the median (by numerical 
methods) of this distribution. 


1.7 Properties of the l^stribation Function 

In Section 1.5 we defined the distribution function of a 
random variable X as F(x) = Pr (Jf < x). This concept was used 
in Section 1.6 to find the probability distribution of a random 
variable of the continuous type. So, in terms of the p.d.f. /(x), we know 
that 

z /(w), 

w^x 
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for the discrete type of random variable, and 




f(w) dw, 


for the continuous type of random variable. We speak of a distribution 
function i^jc) as being of the continuous or discrete type, depending 
on whether the random variable is of the continuous or discrete type. 

Remark. If is a random variable of the continuous type, the p.d.f./(x) 
has at most a finite number of discontinuities in every finite interval. This 
means (1) that the distribution function i*l(x)is everywhere continuous and (2) 
that the derivative of with respect to x exists and is equal to fix) at each 
point of continuity of f{x). That is, /^(x) = f{x) at each point of continuity 
of/(x). If the random variable X is of the discrete type, most surely the p.d.f. 
f{x) is not the derivative of f{x) with respect to x (that is, with respect to 
Lebesgue measure); but f(pc) is the (Radon-Nikodym) derivative of F{x) with 
respect to a counting measure. A derivative is often called a density. 
Accordingly, we call these derivatives probability density functions. 

There are several properties of a distribution function that can 
be listed as a consequence of the properties of the probability set 
function. Some of these are the following. In listing these properties, 
we shall not restrict A" to be a random variable of the discrete or 
continuous type. We shall use the symbols F(oo) and —oo) to mean 
lim F(x) and lim /^x), respectively. In like manner, the symbols 

欠 ，00 . X-t —00 • 

{jc : jc < 00 } and {jc : jc < — 00 } represent, respectively, the limits of the 
sets {jc : jc < 6} and {jc : jc < —b} as b-*ao. 

1. 0 < F(x) < 1 because 0 < Pr (A" < x) < 1. 

2. i r (jc) is a nondecrcasing function of x. For, if x' < x'\ then 

{jc : x < = {x: x ^ u {x: ^ < 

and 

Pt(X<x") = Pt{X< x') + Pr(x , < JIT<x w ). 

That is, ... 

F{x") - ^0 = Pr (jc 7 < JT ^ x") > 0. 


3. F(co) — 1 and F(—co) = 0 because the set {jc : jc < 00 } is the 
entire one-dimensional space and the set {x: x ^ — 00 } is the null set. 
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From the proof of property 2, it is observed that, a <b, then 
?T(a<X<b) = F(b)-F(a). 

Suppose that we want to use /T(x) to compute the probability 
Pr (JT = b). To do this, consider, with h>0, 

\imPT(b-h < X<b) = \im 剛 -F(b - h)]. 

h-*0 

Intuitively, it seems that litn Pr (b — h < X < b) should exist and be 

equal to Pr (X = b) because, as h tends to zero, the limit of the set 
{x •• b — h < x ^ b} is the set that contains the single point x — b. The 
fact that this limit is Pr {X = 6) is a theorem that we accept without 
proof. Accordingly, we have 

?r{X=b) = F{b)-F{b~\ 

where F{b~) is the left-hand limit of at jc = b. That is, the 
probability that X = bisthe height of the step that /T(x) has at x = b. 
Hence, if the distribution function F{x) is continuous at x = b, then 
Pr(J^ = 6) = 0. 

There is a fourth property of that is now listed. 

4. F(x) is continuous from the right, that is, right-continuous. 

To prove this property, consider, with 7i > 0, 

lim {a < X <, a + h) = lim [F\a + h) — F(a)]. 

*-►0 h ~*0 

We accept without proof a theorem which states, with /i > 0, that 
lim Pr (a < ^ < a + /i) = P{0) = 0. 

A -0 

Here also, the theorem is intuitively appealing because, as h tends to 
zero, the limit of the set {x : a < ^ ^ a 4- h} is the null set. Accordingly, 
we write 

0 = / ^0 + ) — F(a), 

where ^ 0 +) is the right-hand limit of F(x) at x = a. Hence i*T(x) is 
continuous from the right at every point x = a. 


Remark. In the arguments concerning several of these properties, we 
appeal to the reader’s intuition. However, most of these properties can be 
proved in formal ways using the definition of lim A k , given in Exercises 1.7 
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and 1. 8 , and the fact that the probability set function P is countably additive; 
that is, P enjoys (b) of Definition 7. 

The preceding discussion may be summarized in the following 
manner: A distribution function F(x) is a nondecreasing function of x, 
which is everywhere continuous from the right and has F(—oo) = 0, 
F(oo) = 1. The probability Pr (a < X < b) is equal to the difference 
F(b) — F(a). If jc is a discontinuity point of then the probability 
Pr (X = x) is equal to the jump which the distribution function has at 
the point x. If x is a continuity point of 八 jc), then Pr (X = x) = t). 

Remark. The definition of the distribution function makes it clear that the 
probability set function P determines the distribution function F. It is true, 
although not so obvious，that a probability set function P can be found from 
a distribution function F. That is, P and F give the same information about 
the distribution of probability, and which function is used is a matter of 
convenience. 

Often, probability models can be constructed that make reason¬ 
able assumptions about the probability set function and thus the 
distribution function. For a simple illustration, consider an experiment 
in which one chooses at random a point from the dosed interval [a, b], 
a < b, that is on the real line. Thus the sample space ^ is [a, b]. Let the 
random variable X be the identity function defined on 贫 . Thus the 
space js/ of A' is = ^. Suppose that it is reasonable to assume, from 
the nature of the experiment, that if an interval /I is a subset of the 
probability of the event A is proportional to the length of A. Hence, 
if A is the interval [a, x], x < b ， then 

P(A) = Pr (JITe /4) = Pr (a < ^ < x) = c(x - a), 

where c is the constant of proportionality. 

In the expression above, if we take x = b, we have 

1 = Pr (a < A" < fc) = c{b — a). 


so c = 1 j{b — a). Thus we will have an appropriate probability model 
if we take the distribution function of X, F{x) = Pr (A" < x), to be 

F(x) = 0 ， x < a. 


b < x. 
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Fix) 


FIGURE 1.7 

Accordingly, the p.d.f. of X, f(x) = F{x), may be written 

f(x) = -r ^ — ， a <x <b, 

b — a 

= 0 elsewhere. 

The derivative of F(x) does not exist at x = a nor at a ： = b; but the set 
{x: x = a, b}isa set of probability measure zero, and we elect to define 
/(X) to be equal to 1 f(b — a) at those two points, just as a matter of 
convenience. We observe that this p.d.f. is a constant on si. If the p.d.f. 
of one or more variables of the continuous type or of the discrete type 
is a constant on the space j/, we say that the probability is distributed 
uniformly over s/. Thus, in the example above, we say that X has a 
uniform distribution over the interval [a, b]. 

We now give an illustrative example of a distribution that is neither 
of the discrete nor continuous type. 

% 

Example i. Let a distribution function be given by 
F\x) = 0 ， x < 0, 

= 0<x< 1, 

= 1 , 1 ^ X. 

Then, for instance, 

Pr(-3<^<|) = ^)-^-3) =|-0 = | 
and 

Pr(1=0) = F(0) - 八 0-） = 卜 0 = [ 

The graph of F(x) is shown in Figure 1.7. We see that /l[x) is not always 
continuous, nor is it a step function. Accordingly, the corresponding 
distribution is neither of the continuous type nor of the discrete type. It may 
be described as a mixture of those types. 




Sec. 1.7] Properties of the Distribution Function 


49 


Distributions that are mixtures of the continuous and discrete 
types do, in fact, occur frequently in practice. For illustration, in life 
testing, suppose we know that the length of life, say X, exceeds the 
number b, but the exact value is unknown. This is called censoring. For 
instance, this can happen when a subject in a cancer study simply 
disappears; the investigator knows that the subject has lived a certain 
number of months, but the exact length of life is unknown. Or it might 
happen when an investigator does not have enough time in an 
investigation to observe the moments of deaths of all the animals, say 
rats, in some study. Censoring can also occur in the insurance industry; 
in particular, consider a loss with a limited-pay policy in which the top 
amount is exceeded but it is not known by how much. 

Example 2. Reinsurance companies are concerned with large losses 
because they might agree, for illustration, to cover losses due to wind damages 
that are between $2,000,000 and $10,000,000. Say that X equals the size of a 
wind loss in millions of dollars, and suppose that it has the distribution 
function 

F{x) — 0, —oo < jr < 0, 

10 V n 〆 

ToT^ * °^^ <00 - 



If losses beyond $10,000,000 are reported only as 10, then the distribution 
function of this censored distribution is 


F{x) — 0, —oo<x< 



10 \ 


5 


0 , 

0 ^ < 10 , 


=1 ， 10 < x < oo, 

which has a jump of [ 10/(10 + 10)] 3 = 5 at jc = 10 . 


We shall now point out an important fact about a function of a 
random variable. Let X denote a variable with space 

Consider the function Y = u(X) of the random variable X. Since X is 
a function defined on a sample space 劣， then Y = u{X) is a composite 
function defined on That is, Y = u{X) is itself a random variable 
which has its own space 激 ={ 少：少 =u(x), x e s^} and its own 
probability set function. If y e 激 ， the event Y = u(X) ^ y occurs when, 
and only when, the event A" e A cz ^ occurs, where A = {x : m(jc) <, y): 
That is, the distribution function of Y is 


G(y) = Pr(r^y) = Pr [u(X) <y] = P(A). 
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The following example illustrates a method of finding the distribution 
function and the p.d.f.ofa function of a random variable. This method 
is called the distribution-function technique. 

Example J. Let f{x) = 3 , — 1 < x < 1, zero elsewhere, be the p.d.f. of 
the random variable X. Define the random variable K by K = X 1 . We 
wish to .find the p.d.f. of K If 7 > 0, the probability Pr (Ys y) is equivalent 
to 

Pr(X 1 ^y) = 

Accordingly, the distribution function of Y, G{y) = Pr (K < 7 ), is given 
by 


G(y) = c 

y< 

0 , 


'•Jy 

:A' 

= 



-A 


=1 

,1 < 

y- 


0 <y < 1, 


Since K is a random variable of the continuous type, the p.d.f. of Y is 
= G\y) at all points of continuity of g(_y). Thus we may write 


容 (>0 




0<y<\, 



elsewhere. 


Remarks. Many authors use f x and f Y to denote the respective probability 
density functions of the random variables X and Y. Here we use / and g 
because we can avoid the use of subscripts. However, at other times, we will 
use subscripts as in f x and f Y or even /, and f 2 , depending upon the 
circumstances. In a given example, we do not use the same symlwl, without 
subscripts, to represent different functions. That is, in Example 2, we do not 
use f(x) and f(y) to represent different probability density functions. 

In addition, while we ordinarily use the letter x in the description of 
the p.d.f. of X, this is not necessary at all because it is unimportant which 
letter we use in describing a function. For illustration, in Example 3, we 
could say that the random variable Khas the p.d.f. = \l2^/w,0 < w < 1 , 
zero elsewhere, and it would have exactly the same meaning as Y has the 
p.d.f. = 1 / 2 ^/^, 0 < 7 < 1 , zero elsewhere. 

These remarks apply to other functions too, such as distribution functions. 
In Example 3, we could have written the distribution function of Y, where 
0 < w < 1 , as 

F y {w) = Pr (K < w) = y/w. 
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EXERCISES 


1.71. Given the distribution function 


F(x)=0, 


■X < — 1 ， 


x+2 

：_ 4 _ 


1 < x< 1, 


1 <x. 


Sketch the graph of F(x) and then compute: (a) Pr(—(b) 
Pr(A r =0); (c) Pr (X=\); (d) Pr (2 <^<3). ' 


1.72. Let J[x) = 1, 0<x< 1, zero elsewhere, be the p.d.f. of 
distribution function and the p.d.f. of Y=^/x. 

Hint: Pr (K<>») = Pr ( v /A ; <^) = Pr (X^y 2 ), 0 〈少 < 1’. 


(.Find the 


1.73. Let_/{;c) = -x/ 6 , x= 1, 2, 3, zero elsewhere, be the p.d.f. of X. Find the 
distribution function and the p.d.f. of Y=X^. 

Hint: Note that A" is a random variable of the discrete type. 

1.74. Lety(jc) = (4 —jc)/16, —2<x<2, zero elsewhere, be the p.d.f. of X. 

(a) Sketch the distribution function and the p.d.f. of X on the same seV of 
axes. 

(b) If ⑶， compute Pr (K< 1). 

(c) If Z^X 2 , compute Pr (Z<\). 

1.75. Let X have the p.d.f. = 2x, 0<x< 1, zero elsewhere. Find the 
distribution function and p.d.f. of Y=X 2 . 

1.76. Let X have the p.d.f. y(x)=4x 3 , 0<x< 1, zero elsewhere. Find the 
distribution function and p.d.f. of Y— —2 In X*. 

1.77. Explain why, with h>0, the two limits, lim Pr (b — h<X<b) and 

lim F\b — h), exist. A_, ° 

Hinf.'Hoit that {b — h<X<b)'\s bounded below by zero and F{b — h) 
is bounded above by both F{b) and 1 . 


1.78. Let /l(x) be the distribution function of the random variable X. If m is 
a number such that F{m)=\ , show that m is a median of the distribution. 

1.79. Let j[x) = 5 , — 1 <x<2, zero elsewhere, be the p.d.f. of X. Find the 
distribution function and the p.d.f. of Y^X 2 . 

Hint: Consider Pr for two cases: 0< ^ < 1 and 1 ^y<4. 



1.8 Expectation of a Random Variable 

Let ^ be a random variable having a p.d.f./(x) such that we have 
certain absolute convergence; namely, in the discrete case, 

X |jc|/(jc) converges to a finite limit, 

X 

or, in the continuous case, 

广 00 

|x|/(x) dx converges to a finite limit. 

J-00 

The expectation of a random variable is 

E(X) = [ X/(x), in the discrete case, 


or 

/ «oo 

E(X) = x f(x) dx ， in the continuous case. 

J 一 。0 

Sometimes the expectation £*(幻 is called the mathematical expectation 
of X or the expected value of X. 

Remark. The terminology of expectation or expected value has its 
origin in games of chance. This can be illustrated as follows: Four small similar 
chips, numbered 1,1,1， and 2, respectively, are placed in a bowl and are mixed. 
A player is blindfolded and is to draw a chip from the bowl. If she draws one 
of the three chips numbered 1, she will receive one dollar. If she draws the chip 
numbered 2, she will receive two dollars. It seems reasonable to assume that 
the player has a ‘1 claim" on the $1 and a claim” on the $2. Her “total 
claim ,, is(l)(|) + 2(i) = |, that is, $1.25. Thus the expectation of Jfis precisely 
the player's claim in this game. 

v . 

Example 1. Let the random variable X of the discrete type have the p.d.f. 
given by the table 


JC 

1 2 

3 4 

/w 

4 1 

T(5 To 

3 2 

1G 16 


Here/(x) = 0 if jc is not equal to one of the first four positive integers. This 
illustrates the fact that there is no need to have a formula to describe a p.d.f. 
We have 


E(X) = (l)(f 0 ) + 2(^) + 3(f 0 ) + 4( 奶 =§ = 2.3. 
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Example Z Let X have the p.d.f. 

f{x) = 4a: 3 , 0 < < 


elsewhere. 


Then 


E{X) 


^( 4 ^) dx 


Let us consider a function of a random variable with space 
si. Call this function Y = u(X). For convenience, let A" be of 
the continuous type and y = u(x) be a continuous increasing function 
of X with an inverse function x = which, of course, is 
also increasing. So y is a random variable and its distribution function 
is 

G{y) = Pr(y<^) = Pr [u(X) <y] = Pr [X^ w{y)] 

= Ax)dx, 

d - 00 

whereis the p.d.f. of X. By one form of the fundamental theorem 
of calculus, 

g(y) = G\y) =f[w{y)]w\y\ 少 e 涿， 

= 0 elsewhere, 

where 

^ = {y:y = u(x), xes/j. 

By definition, given absolute convergence, the expected valiie of Y is 


EiY) 


yg(y) dy. 


Since y — we might ask how E(F) compares to the integral 

K -■ A00 * ,• . 

/= u(x)fix) dx. 


To answer this, change the variable of integration through 少 
equivalently, x = w(^). Since 

^ = ny) > 0 , : 


u(x) or, 


4 15 
II 
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we have 


yAw(y)W(y) dy 


yg(y) dy. 


That is, in this special case, 


E(Y) 


yg(y) dy 


m(jc)/(jc) dx. 


However, this is true more generally and it also makes no difference 
whether X is of the discrete or continuous type and Y = u(X) need not 
be an increasing function of X {Exercise 1.87 illustrates this). 

So if K = {^(A') has an expectation, we can find it from 


u(x)f(x) dx. 


E[u{X)\ = 

in the continuous case, and 

E[u{X)\ = S u(x)f(x\ 


( 1 ) 


( 2 ) 


in the discrete case. Accordingly, we say that E\u(X)] is the expectation 
(mathematical expectation or expected value) of u(X). 

Remark. If the mathematical expectation of Y exists, recall that the 
integral (or sum) 


\y\giy)dy 


or 乙 _ 少 ) 


exists. Hence the existence of £^( 幻 ] implies that the corresponding integral 
(or sum) converges absolutely. 


Next, we shall point out some fairly obvious but useful facts about 

expectations when they exist. 

1. If is a constant, then E(k) = k. This follows from expression (1) 
[or (2)] upon setting u = k and recalling that an integral (or sum) 
of a constant times a function is the constant times the integral (or 
sum) of the function. Of course, the integral (or sum) of the function 
/is 1. 

2. If ^ is a constant and visa, function, then E{kv) = kE(v). This follows 
from expression (1) [or (2)] upon setting u = kv and rewriting 
expression (1) [or (2)] as k times the integral (or sum) of the product 

vf- 

3. If /r, and k 2 are constants and v { and v 2 are functions, then 

+ k 2 v 2 ) = k]E(v { ) + k 2 E(v 2 ). This, too, follows from ex- 
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pressiori (1) [or (2)] upon setting u = k^v x + k 2 v 2 because the integral 
(or sum) of {k^v x + k 2 v 2 )f is equal to the integral (or sum) of k'v'f 
plus the integral (or sum) of k 2 v 2 f. Repeated application 
of this property shows that if k 2 ,..., k m are constants and 
v u v 2 ,... ,v m are functions, then 

E{k x v x + k 2 v 2 + … + k m v m ) = kiEiVi) + k 2 E(v 2 ) + • • • + k m E(v m ). 


This property of expectation leads us to characterize the symbol E 
as a linear operator. 


Example 3. Let X have the p.d.f. 

f(x) = 2(1- x), 0<x <\： 

= 0 elsewhere. 


Then 


E{X) 


xf{x) dx 


(jc) 2(1 — jc) dx 


*00 广 1 

E(X 2 ) — ^/(x) dx ?= 

J -00 ^0 


(x 2 ) 2 (l - jt) i, 


and, of course, 

E(6X+ 3X 2 ) = 6 (|) + 3(|) = I . 


Example 4. Let X have the p.d.f. 

^, jc = 1 ， 2, 3, 


Then 


elsewhere. 


^) = I^/W= I Jc 3 


= I + 16 + 81 = 28 

6 丁 6 丁 6 6 - 

i ： -* 

Example 5. Let us divide, at random, a horizontal line segment of length 
5 into two parts. If JT is the length of the left-hand part, it is reasonable to 
assume that X has the p.d.f. 

f{x) = |, 0 < jc < 5, 

= 0 elsewhere. 

The expected value of the length X is E{X) == | and the expected value of the 



56 


PrcbiMUty trnd Distributions |Ch. 1 


length 5 — A" is E(5 — X) = \. But the expected value of the product of the 
two lengths equal to 

x(5 - x)(|) i/x = f (f) 2 . 

That is, in general, the expected value of a product is not equal to the product 
of the expected values. 

Example 6. A bowl contains five chips, which cannot be distinguished by 
a sense of touch alone. Three of the chips are marked $1 each and the 
remaining two are marked $4 each. A player is blindfolded and draws, at 
random and without replacement, two chips from the bowl. The player is paid 
an amount equal to the sum of the values of the two chips that he draws and 
the game is over. If it costs $4.75 to play this game, would we care to participate 
for any protracted period of time? Because we are unable to distinguish the 
chips by sense of touch, we assume that each of the 10 pairs that can be drawn 
has the same probability of being drawn. Let the random variable X be the 
number of chips, of the two to be chosen, that are marked $1. Then, under 
our assumption, X has the hypergeometric p.d.f. 



= 0 elsewhere. 


If A" = x, the player receives u(x) = x + 4(2 — x) = 8 — 3x dollars. Hence his 
mathematical expectation is equal to 

£[8 - 3A] = t (8-3 聊 ) = 箝， 

a = 0 

or $4.40. 

EXERCISES . 

1.80. Let X have the p.d.f. f(x) = (x + 2)/18, — 2 < jc < 4, zero elsewhere. 
Find E(X), £[(1 + 2) 3 }, and E[6X - 2(X + 2) 3 ]. 

1.81. Suppose that/(x) = 3 , jc = 1,2, 3, 4, 5, zero elsewhere, is the p.d.f. of 
the discrete type of random variable X. Compute E(X) and EiX 2 ). Use 
these two results to find E[{X + 2) 2 ] by writing (A" 十 2 ) 2 = A " 2 十 41 + 4. 

1.82. Let A" be a number selected at random from a set of numbers 
{51, 52, 53,, 100}. Approximate E{\jX). 
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Hint: Find reasonable upper and lower bounds by finding integrals 
bounding E{\jX). 

1.83. Let the p.d.f. j\x) be positive at x = — 1 ， 0, 1 and zero elsewhere. 

(a) lf/(0) = i ， find£ , (jr 2 ). 

(b) If/(0) = i and if E(X) = \, determine 双 一 1) and/(I). 

1.84. Let X have the p.d.f./(x) == 3X 2 ,0 < x < 1, zero elsewhere. Consider a 
random rectangle., whose sides are X and (1 — X). Determine the expected 
value of the area of the rectangle. 

1.85. A bowl contains 10 chips, of >yhich 8 are marked $2 each and 2 are 
marked $5 each. Let a person choose, at random and without replacement, 
3 chips from this bowl. If the person is to receive the sum of the resulting 
amounts, find his expectation. 

1.86. Let A" be a random variable of the continuous type that has p.d.f. f(x). 
If m is the unique median of the distribution of X and bis a real constant, 
show that 

•b 

E(\X - b\) = E(\X - m|) + 2 (b - x)f(x) dx, 

provided that the expectations exist. For what value of b is E(\X — Z>|) a 
minimum? 

N 

s * . ' 

1.87. Let f(x) = 2or, 0 < jc < 1, zero elsewhere, be the p.d.f. of X. 

(a) Compute E{\jX). 

(b) Find the distribution function and the p.d.f. of K = l/X. 

(c) Compute E(Y) and compare this result with the answer obtained in 
part (a). 

Hint: Here = {jc : 0 < jc < 1 }，find 39. 

»• : , * 

1.88. Two distinct integers 1 are chosen at random and without replacement 
from the first six positive integers. Compute the expected value of the 
absolute value of the difference of these two numbers. 

1.9 Some Special Expectations 

Certain expectations, if they exist, have special names and symbols 
to represent them. First, let A" be a random variable of the discrete type 
having a p.d.f. f{x). Then 

E{X) = Y, xAx). 

X 
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If the discrete points of the space of positive probability density are 
ct\ Y a;，■ _ •, then 

E(X) = aj{a x ) + a 2 J\a 2 ) + fl 3 /(a 3 ) + … . 

This sum of products is seen to be a. “weighted average” of the values 
fli, a 2 , , the “weight” associated with each a, being f(a). This 

suggests that we call.f^A 1 ) the arithmetic mean of the values of X, 
or, more simply, the mean value of X (or the mean value of the 
distribution). 

The mean value ^ of a random variable X is defined, when it exists, 
to be ^ = E{X), where X is a random variable of the discrete or of the 
continuous type. 

Another special expectation is obtained by taking u(X) —{X— n) 2 . 
If, initially, A" is a random variable of the discrete type having a p.d.f. 
fix), then 

, * -- • . 

E[{X - fif] = - nffix) 

X 

=(fli 一 #) 2 /(fli) + (a 2 — fi) 2 f(a 2 ) + .. ■ ， 

ifa l5 a 2 ,... are the discrete points of the space of positive probability 
density. This sum of products may be interpreted as a “weighted 
average” of the squares of the deviations of the numbers a,, a 2 ,... 
from the mean value n of those numbers where the “weight” associated 
with each {a, — fi) 2 is /(a,). This mean value of the square of the 
deviation of X from its mean value n is called the variance of X (or the 
variance of the distribution). 

The variance of X will be denoted by a 2 , and we define a 2 , if it exists, 
by a 2 = E[{X — 只 ) 2 ]，whether A" is a discrete or a continuous type of 
random variable. Sometimes the variance of X is written var (X). 

It is worthwhile to observe that var {X) equals 

• * . 

ff 2 = E[(X - n) 2 ] = E(X 2 - 2fiX+fi 2 ); 
and since £ is a linear operator, * 卜. : ， 

a 2 = EiX 2 ) - 2fiE(X) + ^ 

= EiX 1 )-lii 1 + n 2 
= E{X 2 ) - n 2 . 

This frequency affords an easier way of computing the variance of X. 
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It is customary to call a (the positive square root of the variance) 
the standard deviation of X (or the standard deviation of the 
distribution). The number a is sometimes interpreted as a measure of 
the dispersion of the points of the space relative to the mean value 
/x. We note that if the space contains only one point x for which 
f(x) > 0 , then a = 0 . 

Remark. Let the random variable X of the continuous type have the 
p.d.f. f{x) = l/2a, —a < x < a, zero elsewhere, so that a = a/^/3 is the 
standard deviation of the distribution of X. Next, let the random variable Y 
of the continuous type have the p.d.f. 客 (_y) = l/4a, —la <y <2a, zero 
elsewhere, so that a = 2ajyji is the standard deviation of the distribution of 
Y. Here the standard deviation of Y is greater than that of A"; this reflects the 
fact that the probability for Y is more widely distributed (relative to the mean 
zero) than is the probability for X. 

We next define a third special mathematical expectation, called the 
moment-generating function (abbreviated m.g.f.) of a random variable 
X. Suppose that there is a positive number h such that for —h<t<h 
the mathematical expectation E(e tX ) exists. Thus 

•ao 

E(e ,x ) = e ,x f{x) dx, 

^ — QO 

if JT is a continuous type of random variable, or 

E(^ x ) = S e^ x f{x), 

X 

if JT is a discrete type of random variable. This expectation is called the 
moment-generating function (m.g.f.) of X (or of the distribution) an(^ 
is denoted by M(t). That is, 

M(t) = E{e ,x ). 

It is evident that if we set t = 0, we have A/(0) = 1. As will be seen by 
example, not every distribution has an m.g.f., but it is difficult to 
overemphasize the importance of an m.g.f., when it does exist. This 
importance stems from the fact that the m.g.f. is unique and completely 
determines the distribution of the random variable; thus, if two random 
variables have the same m.g.f., they have the same distribution. This 
property of an m.g.f. will be very useful in subsequent chapters. Proof 
of the uniqueness of the m.g.f. is based on the theory of transforms in 
analysis, and therefore we merely assert this uniqueness. 
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Although the fact that an m.g.f. (when it exists) completely 
determines the distribution of one random variable will not be proved, 
it does seem desirable to try to make the assertion plausible. This cao 
be done if the random variable is of the discrete type. For example, let 
it be given that 

is, for all real values of t, the m.g.f. of a random variable X of the 
discrete type. If we let /(jc) be the p.d.f. of X and let a, b,c,d,... be 
the discrete points in the space of X at which /(jc) > 0, then 

刪 =S e' x f(x), 

X 

or 

士 〆 + + h + ^ e4r =/( a )〆 ’ 十 f( 的 〆 十 .-.. 

Because this is an identity for all real values of t, it seems that the 
right-hand member should consist of but four terms and that each of 
the four should equal, respectively, one of those in the left-hand 
member; hence we may take a = 1, f{a) = ^\b = 2,f(b) = ^; c = 3, 
/(c) = ~\d= 4, f(d) = ^. Or, more simply, the p.d.f. of X is 

/W = ^, ^=1,2,3,4, 

= 0 elsewhere. 

On the other hand, let A" be a random variable of the continuous 
type and let it be given that 

- 場：占， /<:1 ， 

is the m.g.f. of X. That is, we are given 

JZT t = 々 /(x) dx, t < 1. 

.. ■** ao * 

It is not at all obvious how /(jc) is found. However, it is easy to see that 
a distribution with p.d.f. 

/(jc) = e~ x , 0 < x < oo ， 

= 0 elsewhere 

has the m.g.f. M{t) = (1 — t)~\ t < 1. Thus the random variable X 
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has a distribution with this p.d.f. in accordance with the assertion of 
the uniqueness of the m.g.f. 

Since a distribution that has an m.g.f. M(t) is completely deter¬ 
mined by Af(r), it would not be surprising if we could obtain some 
properties of the distribution directly from M(t). For example, the 
existence of M(t) for —h < t < h implies that derivatives of all order 
exist at t = 0. Thus, using a theorem in analysis that allows us to 
change the order of differentiation and integration, we have 


dM(t) 

~~dT 


= M\t)= 


x^ x f(x) dx, 

^ — GO 


if X is of the continuous type, or 



= mo = y. ww， 

X 



if X is of the discrete type. Upon setting / = 0, we have in either case 

M'(O) = E(X) = n. 

The second derivative of M(t) is 

*00 

M'Xt) = dx or Y, ^ e ， x fi x )^ 

J-oo J 

so that Af\0) = EiX 2 ). Accordingly, the var (X) equals 

ff 2 = EiX 2 ) -n 2 = M"(0) - [M'(0)] 2 . 

For example, if M(t) = (1 — f < U as in the illustration above, 
then 


M\t) = (1 - O' 2 and M'\t) = 2(1 ，广 3 . 


Hence 


H = M f (0) = 1 
and 

a 2 = W(0) — fi 2 = 2 — 1 = 1. 

Of course, we could have computed n and a 2 from the p.d.f. by 

产 ao /*oo 


pi = xf(x)dx and 

00 


ff 2 = ^/(x) dx - 

J - 00 . 
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respectively. Sometimes one way is easier than the other. 

In general, if m is a positive integer and if A^ m) (?) means the mth 
derivative of M(t), we have, by repeated differentiation with respeet to 


Now 


EiX"') 


= £(JT). 


xTfix) dx or X 


and integrals (or sums) of this sort are, in mechanics, called moments. 
Since M(t) generates the values of EiX"*), m = 1, 2, 3,..., it is called 
the moment-generating function (m.g.f.). In fact, we shall sometimes 
call EiX"*) the mth moment of the distribution, or the mth moment 
of X 

Example I. Let X have the p.dX 

J[x) = + 1), -1 < X < 1, 

= 0 elsewhere. 

Then the mean value of X is 


(»i 




while the variance of X is 


jc/(jc) dx 


x X t^dx 


a" 


i 2 J[x) dx — fi 2 




Example Z If X has the p.d.f. 

Ax) 


< X < 00, 


X 2 ， 

= 0 elsewhere, 
then the mean value of X does not exist, since 




|x| — dx= lim 

X / >-*oo 


X 


dx 


=lim (In 6 — In 1) 
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Example 3. It is known that the series 

111 

卩 + 歹 +$+ … 

cohverges to tt 2 /6. Then 

- * v ■> 

伽 = 杂 x = 1,2,3,：：：, 

= 0 elsewhere, 

is the p.d.f. of a discrete type of random variable X. The m.g.f. of this 
distribution, if it exists, is given by 

M{t) = E{e> x ) = X e>r(x) 

X 

00 

'? • 

The ratio test may be used to show that this series diverges if / > 0. Thus there 
does not exist a positive number A such that M(t) exists for —h < t < h. 
Accordingly, the distribution having the p.d.f. /(a-) of this example does not 
have an m.g.f. 


Example 4. Let X have the m.g.f. M(r) = e' l, \ ~oo < t < oo. We can 
differentiate M{t) any number of time.s to find the moments of X. However, 
it is instructive to consider this alternative method. The function M(t) is 
represented by the following MacLaurin’s series. 



= 1 + 丄〜_〜 

21 4! (2A:)! 

In general, the MacLaurin's series for M(/) is 

…一 .从'歡 _ 卿 ） j . .沪 f . 


M(t) = M(0) 


ml 


, E(X) EiX 7 ) 7 E^JT) 

1 ! 2! ml 

t 

Thus the coefficient of (r/m!) in the MacLaurin’s series representation of M(t) 
is Ei^). So, for our particular M(t), we have 


EiX^) = (2k - l)(2k — 3) … (3)(1) 


2*W ' 


A: = 1,2, 3,..., and EiX^-') = 0, Jk = 1,2, 3,.... 
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Remarks. In a more advanced course, we would not work with the m.g.f. 
because so many distributions do not have moment-generating functions. 
Instead, we would let i denote the imaginary unit, t an arbitrary real, and we 
would define (p(t) = £■(〆’•*). This expectation exists for every distribution and 
it is called the characteristic function of the distribution. To see why <p(t) exists 
for all real r, we note, in the continuous case, that its absolute value 


剛 


€f ,x f{x) dx 


< 


W ,x Rx)\ dx. 


However, |/(x)| = f{x) since _/tx) is nonnegative and 


I 产 I = jcos 以 + / sin tx\ = ^/cos 2 tx + sin 2 tx 


Thus 


l«p(OI ^ 


fix) dx 


Accordingly, the integral for (p(t) exists for all real values of t. In the discrete 
case, a summation would replace the integral. 

Every distribution has a unique characteristic function; and to each 
characteristic function there corresponds a unique distribution of prob¬ 
ability. If X has a distribution with characteristic function ^(r), then, for 
instance, if E(X) and E^X 2 ) exist, they are given, respectively, by iE(X)= 
and PEiX 2 ) — «/> w (0). Readers who are faminar with complex-valued 
functions may write (p(t) = M(it) and, throughout this book, may prove 
certain theorems in complete generality. 

Those who have studied Laplace and Fourier transforms will note a 
similarity between these transforms and M(r) and it is the uniqueness of 
these transforms that allows us to assert the uniqueness of each of the 
moment-generating and characteristic functions. 


EXERCISES 

1.89. Find the mean and variance, if they exist, of each of the following 
distributions. 

(a) f{x) = x] (3 3 1 x y 

(b) fix) = 6x(1 - x), 0 < x < 1, zero elsewhere. 

(c) f(x) ^2/x 3 , 1 < jc < oo, zero elsewhere. 

1.90. Let f(x) = (5)^, x = 1, 2, 3,..., zero elsewhere, be the p.d.f. of the 
random variable X. Find the m.g.f., the mean, and the variance of X. 

1.91. For each of the following probability density functions, compute 
Pr (fi — 2a < X < fi + 2a). 


( 士 ) , x = 0, 1,2, 3, zero elsewhere. 
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(a) /(jc) = 6x(1 — x), 0 < x < 1, zero elsewhere. 

(b) f{x) = (j)*, x = 1, 2, 3,..., zero elsewhere. 

1.92. If the variance of the random variable X exists, show that 

1.93. Let a random variable X of the continuous type havd a p.d.f. /(jc) 
whose graph is symmetric with respect to x = c. If the mean value of X 
exists, show that E(X) = c. 

Hint: Show that E(X — c) equals zero by writing E(X — c) as the sum 
of two integrals: one from:' — oo to c and the other ffom c to oo. In the first, 
let >» = c — x; and, in the second, z = x ~ c. Finally, use the symmetry 
condition f(c — y)= f{c + y) in the first. 

1.94. Let the random variable X have mean ^ standard deviation a, and 
m.g.f. M(t), -h< t <h. Show that 



and 

1.95. Show that the m.g.f. of the random variable Shaving the p.d.f. /(jc) = 
— 1 < jc < 2 , zero elsewhere, is 

M(t) = —— , t # 0 , 




ha < t < ha. 


= I, r = 0. 

1.96. Let Tbe a random variable such that E[{X — b) 2 ] exists for all feal b. 
Show that E[(X — b) 1 ] is a minimum when b = E{X). 

1.97. Let X denote a random variable for which E[(X — a) 2 ] exists. Give an 
example of a distribution of a discrete type such that this expectation is 
zero. Such a distribution is called a degenerate distribution. 

1.98. Let X be a random variable such that K(t) = E(t x ) exists for 

all real values of / in a certain open interval that includes the point 
t = 1 . Show that is equal to the 爪 th factorial moment 

E[X{X- l) -(jr-m + 1 )]. 

1.99. Let Xben random variable. If m is a positive integer, the expectation 
E[{X — by], if it exists, is called the mth moment of the distribution about 
the point b. Let the first, second, and third moments of the distribution 
about the point 7 be 3, H, and IS, respectively. Determine the mean ^ of 
X, and then find the first, second, and third moments of the distribution 
about the point fi. 
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1.100. Let I be a random variable such that R(t) = E(e nx ~ b) ) exists for 
—/» < r < A. Ifm is a positive integer, show that /? m, (0) is equal to the mth 
moment of the distribution about the point b. 

1.101. Let A" be a random variable with mean fj. and variance a 2 such that the 
third moment E\{X — /i) 3 ] about the vertical line through n exists. The value 
of the ratio E[(X— //) 3 ]/( 7 3 is often used as a measure of skewness. Graph 
each of the following probability density functions and show that this 
measure is negative, zero, and positive for these respective distributions 
(which are said to be skewed to the left, not skewed, and skewed to the 
rights respectively). 

(a) J{x) = (x + 1 )/ 2 ； — 1 < x < 1 , zero elsewhere. 

(b) 似 = 士， 一 1 < x < 1 , zero elsewhere. 

(c) J{x) = (\'—x)[2, — 1 < x < 1 , zero elsewhere. 

1.102. Let I be a random variable with mean n and variance a 2 such that the 
fourth moment E[{X — ^) 4 ] about the vertical tine through fi exists. The 
value of the ratio E[{X — n^ja 4 is often used as a measure of kurtosis. 
Graph each of the following probability density functions and show that 
this measure is smaller for the first distribution. 

(a) y(x) = 5 , — 1 < x < I, zero elsewhere. 

(b) y(jc) = 3(1 — ^)/4, — 1 < a: < 1, zero elsewhere. 

1.103. Let the random variable X have p.d.f. 

Ax) = p, x= -1,1, 

=\ — 2p, x = 0, 

= 0 elsewhere, 

where 0 < p < j. Find the measure of kurtosis as a function of p. Determine 
its value when /? — 5 , /? = j, and p = Note that the kurtosis 

increases as p decreases. 

1.104. Let = In M(t), where M(t) is the m.g.f. of a distribution. Prove that 
^'(O) = fi and 少 w ( 0 ) = ct 2 . 

1.105. Find the mean and the variance of the distribution that has the 
distribution function 


W = 0, 

x < 0 , 

X 

= 8 , 

0 < x <2, 

x 2 

2 < x <4, 

= 16 ' 

= 1 ， 

4 < jc. 
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1.106. Find the moments of the distribution that has m.g.f. M(t) = (1 — t)~\ 
t < 1. 

Hint: Find the MacLaurin^s series for M(t). 

1.107. Let A" be a random variable of the continuous type with p.d.f. f(x), 
which is positive provided 0 < jr < b k oo, and is equal to zero elsewhere. 
Show that 

e(X)= n-fu)] 也， 

where is the distribution function of X. 

1.108. Let X be a random variable of the discrete type with p.d.f./(x) that 
is positive on the nonnegative integers and is equal to zero elsewhere. Show 
that 

E(X) = f [1 - F\x)l 

j = o 

where F(x) is the distribution function of X. 

1.109. Let X have the p.d.f./(x) = \ jk,x = 1,2,..., k, zero elsewhere. Show 
that the m.g.f. is 

雌 H ， 

= 1 , . / = 0 . = 

1.110. Let X have the distribution function F(x) that is a mixture of the 
continuous and discrete types, namely 

F\x) = 0 , x <0, 

l - =X \^ ' 0 <x < \, 

= I, 1 < x. 

Find /I = E(X) and a 2 = var (A"). 

Hint: Determine that part of the p.d.f. associated with each of the 
discrete and continuous types, and then sum for the discrete part and 
integrate for the continuous part. 

1.111. Consider k continuous-type distributions with the following charac¬ 
teristics: p.d.f. fi(x), mean and variance a^, i = \ , 2,k. If c, ^ 0 , 
i = \,2,..., k, and + c 2 + • ■ ■ + c k = 1 , show that the mean and the 
variance of the distribution having p.d.f. c^f\{x) + • ■ • + c k f k (x) are 

k k 

^ = Z and (T 2 = X + (A- - / 0 2 ] ， respectively. 

/ ** 1 / = 1 
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1.10 Chebyshev’s Inequality 


In this section we prove a theorem that enables us to find upper (or 
lower) bounds for certain probabilities. These bounds, however, are 
not necessarily close to the exact probabilities and, accordingly, we 
ordinarily do not use the theorem to approximate a probability. The 
principal uses of the theorem and a special case of it are in theoretical 
discussions in other chapters. 

Theorem 6. Let u(X) be a nonnegative function of the random 
variable X. If E[u(X)] exists, then, for every positive constant c. 


Pr [u(X) ^ c] < 


E[u(X)] 


c 


Proof. The proof is given when the random variable X is of the 
continuous type; but the proof can be adapted to the discrete case 
if we replace integrals by sums. Let A = {x: m(x) > c} and let f(x) 
denote the p.d.f. of X. Then 


E[u(X)] 


u{x)f{x) dx 


u{x)f{x) dx 4- 


u{x)f{x) dx. 


Since each of the integrals in the extreme right-hand member of the 
preceding equation is nonnegative, the left-hand member is greater 
than or equal to either of them. In particular. 


E[u{X)] ^ 


u{x)f{x) dx. 


However, ii xe then u{x) ^ c; accordingly, the right-hand member 
of the preceding inequality is not increased if we replace «(x) by c. Thus 


E[u{X)] > c 


/(x) dx. 


Since 


fix) dx = Pt(Xe A) = Pr [u(X) > c]. 


it follows that 

E[u(X)] > c Pr [u{X) ^ c], 
which is the desired result. 
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The preceding theorem is a generalization of an inequality that is 
often called Cheby'shev’s inequality. This inequality will now be 
established. 


Theorem?^ Chebyshev^sInequality. Let the random variable Xhave 
a distribution of probability about which we assume only that there is a 
finite variance a 1 . This，of course, implies that there is a mean fi. Then 
for every k >0, 

Pr (1^ - /i| ^ ku) < ^, 

or, equivalently, 

Pr {\X - ^i\ < ka) 1 ~ • 

Proof. - In Theorem 6 take u(X) = (X — fj) 1 and c = /cV. Then we 
have 

Pr [(A- - nf > - 

Since the numerator of the right-hand member of the preceding 
inequality is o 2 , the inequality may be written 

Pr (\X-fi\>ka)<^ 2 , 

which is the desired result. Naturally, we would take the positive 
number k to be greater than 1 to have an inequality of interest. 


It is seen that the number 1 /A 2 is an upper bound for the probability 
Pr (|Jf — fx\> ka). In the following example this upper bound and the 
exact value of the probability are compared in special instances. 1 


Example 1. Let X have the p.d.f. 


Ax) 


2 / 


-^3 < jc< ^/3, 


— 0 elsewhere. 

Here ^ = 0 and ff 2 = 1. If = 5 , we have the exact probability 

dx = l ■ 




Pr (\X- n\ > ka) = ¥rl\X\>^ 




' 3/2 


2 / 



By Chebyshev’s inequality, the preceding probability has the upper bound 
\/k 2 = Since 1 — ^/3/2 = 0.134, approximately, the exact probability in 
this case is considerably less than the upper bound If we take k = 2, we have 
the exact probability Pr (|A" — ^ 2<r) = Pr (|A1 > 2) = 0. This again is 

considerably less than the upper bound 1/it 2 = ^ provided by Chebyshev’s 
inequality. 

In each of the instances in the preceding example, the probability 
Pr (\X — /i| > kff) and its upper bound \/k 2 differ considerably. This 
suggests that this inequality might be made sharper. However, if we 
want an inequality that holds for every k > Q and holds for all random 
variables having finite variance, such an improvement is impossible, as 
is shown by the following example. 

Example 2. Let the random variable X of the discrete type have 
probabilities | at the points x = — 1,0, 1, respectively. Here 只 = 0 and 
^ = I. If it = 2, then \/k 2 =\ and Pr (|^ - /i| ^ ktr) = Pr ^ 1) = That 
is, the probability Pr (|A" — n\> ka) here attains the upper bound IJk 2 = 5 . 
Hence the inequality cannot be improved without further assumptions about 
the distribution of X. 

EXERCISES 

1.112. Let A" be a random variable with mean n and let E[(X — /x) 2 *] exist. 

Show, with d> 0, that Pr (\X — n\^.d) < E[{X — This is 

essentially Chebyshev’s inequality when k = l. The fact that this holds for 

all k = 1,2, 3.when those (2A:)th moments exist, usually provides a 

much smaller upper bound for Pr (|A" — d) than does Chebyshev’s 
result. 

1.113. Let A" be a random variable such that Pr (A" < 0) = 0 and let /i = E(X) 
exist. Show that Pr (A" > 2/i) < 5 . 

1.114. If A" is a random variable such that E{X) — 3 and EiX 2 ) = 13, use 
Chebyshev's inequality to determine a lower bound for the probability 
Pr( — 2< 欠 <8). 

1.115. Let A" be a random variable with m.g.f. M ⑺， —h<t<h. Prove that 

Pr (^ > a) < Q < t < h ， 

and that 

Pr {X <a)< -h < f < 0. 

Hint: Let u(jc) = e ,x and c = e xa in Theorem 6 . Note. These results imply 
that Pr (A" ^ a) and Pr (A" < a) are less than the respective greatest lower 
bounds for e~ a, M(t) when Q < t < h and when —h < t <0. 
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1.116. The m.g.f. of X exists for all real values of / and is given by 

= /#0, A/(0) = 1. 

Use the results of the preceding exercise to show that Pr (X 1) = 0 and 
Pr (X ^ —1) = 0. Note that here h is infinite. 


ADDITIONAL EXERCISES 

1.117. Players A and B play a sequence of independent games. Player A 
throws a die first and wins on a “six.” If he fails, B throws and wins on a 
“five” or “six.” If he fails, A throws again and wins on a “four，” “five，” 
or “six.” And so on. Find the probability of each player winning the 
sequence. 

1.118. Let X be the number of gallons of ice cream that is requested at a 
certain store on a hot summer day. Let us assume that the p.d.f. of X is 
_/( 太 ) =12x(1000 — at) 2 /10 12 , 0 < jc < 1000, zero elsewhere. How many 
gallons of ice cream should the store have on hand each of these days, so 
that the probability of exhausting its supply on a particular day is 0.05? 

1.119. Find the 25th percentile of the distribution having p.d.f . 凡 c) = |jc|/4, 
— 2 < at < 2 , zero elsewhere. 

1.120. Let A t , A 2 , Aj be independent events with probabilities 5 , 
respectively. Compute Pr (.4, u /4 2 u A 3 ). 

1.121. From a bowl containing 5 red, 3 white, and 7 blue chips, select 4 at 
random and without replacement. Compute the conditional probability of 
1 red, 0 white, and 3 blue chips, given that there are at least 3 blue chips 
in this sample of 4 chips. 

1.122. Let the three independent events A, B, and C be such that 
P{A) = P(B) = P(Q = Find P[(/4* nB*)u C]. 

1.123. Person A tosses a coin and then person B rolls a die. This is repeated 
independently until a head or one of the numbers 1,2,3,4 appears, at which 
time the game is stopped. Person >4 wins with the head and B wins with one 
of the numbers 1, 2, 3, 4. Compute the probability that A wins the game. 

1.124. Find the mean and variance of the random variable X having 
distribution function 

= 0, jc<0, 

_ x 

= 4' 


0 < jc < 1, 
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1 <x <2, 

2 < x. 

1.125. Let A" be a random variable having distribution function 

f\x) = 0 ， jc < 0 , 

= 2X 2 , 0 ^ x <[, 

=1 - 2(1 - jc) 2 , '^ x<l 
=1， l ^ x . 

Find Pr (i < A" < |) and the variance of the distribution. 

Hint: Note that there is a step in 

1.126. Bowl I contains 7 red and 3 white chips and bowl II has 4 red and 6 
white chips. Two chips are selected at random and without replacement 
from I and transferred to II. Three chips are then selected at random and 
without replacement from II. 

(a) What is the probability that all three are white? 

(b) Given that three white chips are selected from 11， what is the 
conditional probability that two white chips were transferred from I? 

1.127. A bowl contains ten chips numbered 1,2,..., 10, respectively. Five 
chips are drawn at random, one at a time, and without replacement. What 
is the probability that exactly two even-numbered chips are drawn and they 
occur on even-numbered draws? 

1.128. Let [(JT) = p i , r = 1 ， 2, 3, ... . Find the series representation for 
the m.g.f. of X. Sum this series. 

1.129. Let X have the p.d.f.y(x) = 2x, 0 < jf < 1, zero elsewhere. Compute 
the probability that X is at least \ given that X is at least [. 

1.130. Divide a line segment into two parts by selecting a point at random. 
Find the probability that the larger segment is at least three times the 
shorter. Assume a uniform distribution. 

1.131. Three chips are selected at random and without replacement from a 
bowl containing 5 white, 4 black, and 7 red chips. Find the probability that 
these three chips are alike in color. 

1.132. Factories A, B, and C produce, respectively, 20, 30, and 50% of a 
certain company’s output. The items produced in A, B, and C are 1,2, and 
3 percent defective, respectively. We observe one item from the company's 
output at random and find it defective. What is the conditional probability 
that the item was from A? 


x 2 

T’ 
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1.133. The probabilities that the independent events A, B, and C will occur 
are H，and What is the probability that at least one of the three events 
will happen? 

1.134. A person bets 1 dollar to b dollars that he can draw two cards from 
an ordinary deck without replacement and that they will be of the same suit. 
Find b so that the bet will be fair. 

1.135. 'A bowl contains 6 chips: 4 are red and 2 are white. Three chips are 
selected at random and without replacement; then a coin is tossed a number 
of independent times that is equal to the number of red chips in this sample 
of 3. For example, if we have 2 red and 1 white, the coin is tossed twice. 
Given that one head results, compute the conditional probability that the 
sample contains 1 red and 2 white. *' 



CHAPTER 


Multivariate 

Distributions 


2.1 Distributions of Two Random Variables 

We begin the discussion of two random variables with the following 
example. A coin is to be tossed three times and our interest is in 
the ordered number pair (number of H’s on first two tosses, number 
of H’s on all three tosses), where H and T represent, respectively, heads 
and tails. Thus the sample space is ^ = {c c = c h i = 1, 2,. . ., 8}, 
where c, is ITT, c 2 is TTH, c 3 is THT, c 4 is HTT, c 5 is THH, c 6 is 
HTH, c 7 is HHT, and c g is HHH. Let X t and X 2 be two functions 
such that ^i(^i) = (C 2 ) = 0, Xi(c^) = X t (c^) — = 1 , 

^ 1 (^ 7 ) — 尤 |((8) = 2; and Kc，）= 0 ， Kc 2 ) = == 1 ， 

X 2 (c 5 ) = A^(c 6 ) = Xiic-r) = 2, A^(c 8 ) = 3. Thus X\ and X 2 are 
real-valued functions defined on the sample space 贫 ， which take us 
from that sample space to the space of ordered number pairs 

^ = {(0,0), (0, 1),(1,1),(1,2), (2,2), (2, 3)}. 

Thus X ] and X 2 are two random variables defined on the space 贫， 
and, in this example, the space of these random variables is the two- 
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dimensional set given immediately above. We now formulate the 
definition of the space of two random variables. 

Definition 1. Given a random experiment with a sample spaed 
Consider two random variables X } and X 2 , which assign to each 
element c of ^ one and only one ordered pair of numbers X { {c) = x,, 
X 2 {c) = x 2 . The space of X { and X 2 is the set of ordered pairs 
s/ = {(xi, jc 2 ) : x { = Xi(c), x 2 = X 2 {c), c e ^}. 

Let be the space associated with the two random variables and 
X 2 and let ^4 be a subset of s/. As in the case of one random variable, 
we shall speak of the event A. We wish to define the probability of the 
event A, which we denote by Pr [(Jf,, X 2 ) e A]. Take C = {c : ce^ and 
[Jf,(c), X 2 (c)] e A}, where 货 is the sample space. We then define 
Pr [(^!, X 2 ) e A] — P(Q, where P is the probability set function 
defined for subsets C of 贫 . Here again we could denote Pr [(不 , X 2 ) e A] 
by the probability set function 户 ^ 2 ( 义 ) ； but, with our previous 
convention, we simply write 

P(A) = Pr[(X i ,X 2 )BA]. 

Again it is important to observe that this function is a probability set 
function defined for subsets A of the space si. 

Let us return to the example in our discussion of two random 
variables. Consider the subset A of , where A : = {(U),(U2)}. 
To compute Pr [(X,, X 2 ) e A] — P{A)^ we must include as elements of C 
all outcomes in 贫 for which the random variables X t andX 2 take values 
(jf|, x 2 ) which are elements of A. Now = 1, X 2 (c i ) = 1, 

X x {c A ) = 1, and X 2 (c 4 ) = 1. Also, X t (c 5 ) = U X 2 (c,) = 2, X,(c 6 ) = 1, 
and X 2 (c 6 ) = 2. Thus P(A) = Pr [(^,, X 2 ) ^ A] = P(C), where 
C = {c 3 , c 4 , c 5 , or c 6 }. Suppose that our probability set function P(Q 
assigns a probability of i to each of the eight elements of 贫 ， This 
assignment seems reasonable if P(T) = P(H) = ^ and the tosses are 
independent. For illustration, 

P({c 1 }) = Pr(TTT) = (I)(I)(I) = i. 

Then P{A), which can be written as Pr (X, = 1, = 1 or 2), is equal 

to I = 士 . It is left for the reader to show that we can tabulate the 
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probability, which is then assigned to each of the elements of with 
the following result: 

(Xy,x 2 ) (0,0) (0,1) (1,1) (1,2) (2,2) (2,3) 

Pr[(Jr H jr 2 ) = (x„x 2 )] i ~ I f f I ~ 

This table depicts the distribution of probability over the elements of 
the space of the random variables X y and X 2 . 

Again in statistics we are more interested in the space si of two 
random variables, say X and Y, than that of Moreover, the notion 
of the p.d.f. of one random variable X can be extended to the notion 
of the p.d.f. of two or more random variables. Under certain 
restrictions on the space si and the function / > 0 on ^ (restrictions 
that will not be enumerated here), we say that the two random variables 
X and Y are of the discrete type or of the continuous type, and have 
a distribution of that type, according as the probability set function 
P(A), A s/, can be expressed as 

P(A) = Pr [(X, y)e^] = 2 YJlx, y), 

or as 

P(A) = Pr [(X, Y)gA] = 

In either case / is called the p.d.f. of the two random variables X and 
Y. Of necessity, P{s^) = 1 in each case. 

We may extend the definition of a p.d.f. fix, y) over the entire 
jc 少 -plane by using zero elsewhere ‘ We shall do this consistently so that 
tedious, repetitious references to the space can be avoided. Once this 
is done, we replace 

f{x, y) dx dy by 

Similarly, after extending the definition of a p.d.f. of the discrete type, 
we replace 

. b y ZZ/(^» y)- 

^ y x 

In accordance with this convention (of extending the definition of 
a p.d.f.), it is seen that a point function /, whether in one or two 
variables, essentially satisfies the conditions of being a p.d.f. if (a)/ 



*00 / *00 « 

f /(x, y) dx dy. 


ft 

fix, y) dx dy. 
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is defined and is nonnegative for ail real values of its arguments) and 
if (b) its integral [for the continuous type of random ； variable(s)], or 
its sum [for the discrete type of random variable(s)] over all real values 
of its arguments(s) is 1. 

Finally, if a p.d.f. in one or more variables i$ explicitly defined, we 
can see by inspection whether the random variables are of the con¬ 
tinuous or discrete type. For example, it seems obvious that the p.d.f. 

. 9 ’ ： ^ 

y ) = x = 1，2, 3, • • • ，少 = 1，2, 3, • • ■ ， 

= 0 elsewhere, 


is a p.d.f. of two discrete-type random variables X and Y, whereas the 
p.d.f. 


fix, y) = 4xye~^ ~ 0 < x < co, 0 < y < co, 

= 0 elsewhere. 


is clearly a p.d.f. of two continuous-type random variables X and Y. 
In such cases it seems unnecessary to specify which of the two simpler 
types of random variables is under consideration. 

Example 1. Let 

f( x t y) = 0 < x < 1, 0 <y < 1, 

= 0 elsewhere, 

be the p.d.f. of two random variables X and Y, which must be of the 
continuous type. We have, for instance, 


Pr(0<Jir<|,J< r<2) 


/ * 3/4 

4/3 A 


f(x,y)dxdy 


M /* 3/4 

I Sx^y dx dy + 
Jo 



广 3/4 

Q dx dy 


= 1 + 0 


Note that this probability is the volume under the surface/(x» >>) = and 

above the rectangular set {(x, 少） ： 0 < jc <h l 3<y< 1} in the x 少 - plane. 

% 

Let the random variables X and Fhave the probability set function 
P{A\ where j is a two-dimensional set. If A is the unbounded set 
{(«, v)\u <x,v <,}>}, where x and y are real numbers, we have 


P{A) = Pr [(X, Y)eA] = ?t(X<x, Y < y). 
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This function of the point (x, y) is called the distribution function of X 
and Y and is denoted by 

FCx >y ) = ?r(X<x, Y<y). 

\{X and Tare random variables of the continuous type that have p.d.f. 
fix, y), then 

^,3 7 )= f f /(m, v) du dv. 


Accordingly, at points of continuity of f(x, y), we have 

y) 


dx dy 


f(x, y)- 


It is left as an exercise to show, in every case, that 
Pt (a <X^b,c<Y^d) = F{b,d)-F(b,c)-J\a,d)^F{a, c) y 


for all real constants a < b, c < d. 

Consider next an experiment in which a person chooses 
at random a point (X, Y) from the unit square ^ = s/ = 
{pc, >») : 0 < x < 1, 0 < ^ < 1}. Suppose that our interest is not in X or 
in Y but in Z = X + Y. Once a suitable probability model has been 
adopted, we shall see how to find the p.d.f. of Z. To be specific, let the 
nature of the random experiment be such that it is reasonable to assume 
that the distribution of probability over the unit square is uniform. 
Then the p.d.f. of X and Y may be written 


f(x,y) = 1, 0 < < 1, 0< 少 <1 ， 

= 0 elsewhere, 


and this describes the probability model. Now let the distribution 
function of Z be denoted by G(z) = Pr(X + Y < z). Then 


G(z) — 0, z < 0, 




dydx = ~. 


0 < r < 1, 




"2 - 1 




dy dx 


(2 - zf 


<z <2, 


f Z-X 


=1, 2 < z. 



Sec. —2.11 Distribatioiu of Two Rambm Variables 


79 


Since exists for all values of z, the p.d.f. of Z may then be written 

giz) = z, 0 < z < 1, 

= 2 — z, 1 ^ z < 2, 

= 0 elsewhere. 

It is clear that a different choice of the p.d.f. /(x, y) that describes 
the probability model will, in general, lead to a different p.d.f. of 
Z. 

Let f(x t , x 2 )be the p.d.f. of two random variables X x and X 2 . From 
this point on, for emphasis and clarity, we shall call a p.d.f. or a 
distribution function a joint p.d.f. or a joint distribution function when 
more than one random variable is involved. Thus/(X|, x 2 ) is the joint 
p.d.f. of the random variables X x and X 2 . Consider the event 
a < X t < b, a < b. This event can occur when and only when the event 
a < X 、 <b, — oo < A" 2 < oo occurs; that is, the two events are 
equivalent, so that they have the same probability. But the probability 
of the latter event has been defined and is given by 


Fr (a < < b, — oo < X 2 < oo) 



f(x l ,x 2 )dx 2 dx l 


for the continuous case, and by 

Pr (a < ^, < Z?, , - oo < < co) = X Z A x \^ x i) 

J < Jt| < ^ JQ 

for the discrete case. Now each of 

/ »QO 

f(x ] , x 2 ) dx 2 and Z f(x { , x 2 ) 


is a function of jc, alone, say /,(X|). Thus, for every a<b, v/c have 

r b 

Pr (a < Xi < b) = /|(x,) dx ] (continuous case). 


= fi(x\) (discrete case), 

a < x\ <,b 

so that f\{x x ) is the p.d.f. of X { alone. Since /j(jc|) is found by 
summing (or integrating) the joint p.d.f. /(jc, , x 2 ) over all x 2 for a 
fixed jc,, we can think of recording this sum in the “margin” of the 
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^,jc 2 -plane. Accordingly,/(x,) is called the marginal p.d.f. of X\. In 
like manner 

f*co 

, 2 ( 12 )= f(x u x 2 )dx } 


= X f( x \9 X 2) 

•M 

is called the marginal p.d.f. of X x . 

Example 2. Consider a random experiment that consists of drawing at 
random one chip from a bowl containing 10 chips of the same shape and size. 
Each chip has an ordered pair of numbers on it: one with (1, 1)，one with (2, 1 ), 
two with (3, 1), one with (1,2), two with (2, 2), and three with (3, 2). Let the 
random variables X x and X 2 be defined as the respective first and second values 
of the ordered pair. Thus the joint p.d.f. /(jc,, x 2 ) of X] and can be given 
"by the following table, with /(jc, ， jc 2 )'equal to zero elsewhere. 





^2 

1' 

2 

3 

/ 2 ⑹ 

1 

1 

To 

1 

To 

2 

lo 

4 

To 

2 

i 

10 

2 

To 

3 

To 

6 

10 

/iUi) 

2 

lo 

3 

10 

5 



The joint probabilities have been summed in each row and each column and 
these sums recorded in the margins to give the marginal probability density 
functions of and X 2 , respectively. Note that it is not necessary to have a 
formula for f{x x , x 2 ) to do this. 

Example 3. Let X, and X 2 have the joint p.d.f. 

f(X\,X 2 ) = X { +X 2 , 0 < X| < 1, 0 < Af 2 < 1, 

= 0 elsewhere. 

The marginal p.d.f. of is 

广 I 

= Ui + x 2 ) dx 2 = o < AT, < l, 

zero elsewhere, and the marginal p.d.f. of X 1 is 

/ *1 ? f 

/2( 文 2) = + x 2 ) dx x =\ + Jf 2 , o < x 2 < 1 , 


(continuous case), 
(discrete case), 
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zero elsewhere. A probability like Pr {X x <, can be computed from either 
/i(xi) orf{x u x 2 ) because 

/*iy2 r'/ 2 


f\x u x 2 )dx 2 dx\ 


f\Mdx i 


s- 


^0 


However to find a probability like PrCA", + X 2 < 1), we must use the joint 
p.d.f. f(x t , x 2 ) as follows: ■ 

〜 h r'r (l-A)n 


(j：i + x 2 ) dx 2 dx, 


^0 ^0 


^0 


x t (l -X t ) 


2 


dx. 




以 0 


,2 


dXy 


This latter probability is the volume under the surface f{x u x 2 ) = jc, + jc 2 
above the set {(jc,, x 2 ): 0 < j:,, 0 < j: 2 , jc, + < 1 }. 


EXERCISES 


2.1. Let /(jci , jc 2 ) = 4x,jc 2 , 0 < x, < 1, 0 < jc 2 < 1, zero elsewhere, be the 

p.d.f. of X\ and X 2 . Find Pr (0 < A", < ^, ^ < < 1), Pr {X^ = X 2 ), 

Pr (^, < X 2 ), and Pr (X t < X 2 ). * 

Hint: Recall that Pr d = I 2 ) would be the volume under the surface 
f(x t , x 2 ) = 4x t x 2 and above the line segment 0 < X| = jc 2 < 1 in the 
X|jc 2 -plane. 

2.2. Let A t = : x <2,y <4}, A ： = {(x ,^) : x ^ 2,y < lj, A } = 

{(■X ， y): x <,0, y ^ 4}, and A 4 = {(;c ， >):j:<0, j>< 1} be subsets of the 
space jj/ of two random variables X and Y, which is the entire 
two-dimensional plane. If P(A,) = P(Ai) = P(Ai) = and P{A 4 ) = 
find P(A S ), where A } = {(x, y) :0 < x < 2, l < y < 4}. 

2.3. Let F{x,y) be the distribution function of X and Y. Show that 
Pr (a < X ^ b, c < Y^d) = F{b, d) — F{b, c) — F(a, d) + f{a, c), for all 
real constants a < b, c < d. 

2.4. Show that the function P(x, y) that is equal to 1 provided that x + 2y ^ l, 
and that is equal to zero provided that x + 2y < 1 , cannot be a distribution 
function of two random valiables. 

Hint: Find four numbers a < b, c < d, so that 

F{b,d)~ na, d\-me) + n^c) 

is less than zero. 

.* ^ '■ , 

2.5. Given that the nbnnegative function g(x) has the property that 

/ «GQ 

g(x)dx = 1 . 

J 0 , 
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Show that 

/(x,, x 2 ) = + x\)]/(n^/x] + xj), 0 < X| < 00 , 0 < < oo, 

zero elsewhere, satisfies the conditions of being a p.d.f. of two 
continuous-type random variables X\ and X 2 . 

Hint: Use polar coordinates. 

2.6. Let 少） = e~ x ~\ 0 < x < oo, 0 < y < oo, zero elsewhere, be the 
p.d.f. of X and Y. Then if Z = X + Y, compute Pr (Z < 0), Pr (Z < 6), 
and, more generally, Pr (Z < z), for 0 < z < oo. What is the p.d.f. of 
Z? 

2.7. Let X and Y have the p.d.f. /(x,_y) = 1, 0 < .v < 1, 0 < 少 <1. zero 
elsewhere. Find the p.d.f. of the product Z = XY. 

2.8. Let 13 cards be taken, at random and without replacement, from an 
ordinary deck of playing cards. If X is the number of spades in these 13 
cards, find the p.d.f. of X. If, in addition, Y is the number of hearts in these 
13 cards, find the probability Pr (Jf = 2, Y = 5). What is the joint p.d.f. of 
X and Y1 

2.9. Let the random variables and X 2 have the joint p.d.f. described as 
follows: 

(X lt x,) (0,0) (0,1) (0,2) (1,0) (1,1) (1,2) 

f( x \t -^ 2 ) tj n n n it 

and f(x t , x 2 ) is equal to zero elsewhere. 

(a) Write these probabilities in a rectangular array as in Example 2, 
recording each marginal p.d.f. in the “margins.’' 

(b) What is Pr (JIT, + Jr 2 = 1)? 

2.10. Let X x and X 2 have the joint p.d.f./(x h x 2 ) = 15xfjc 2 , 0 < x t < x 2 < 

zero elsewhere. Find each marginal p.d.f. and compute Pr (Jf, -l- < 1). 

Hint: Graph the space of and X 2 and carefully choose the limits 
of integration in determining each marginal p.d.f. 

2.2 Conditional Distributions and Expectations 

We shall now discuss the notion of a conditional p.d.f. Let 
X x and X 2 denote random variables of the discrete type which 
have the joint p.d.f. f(x t , x 2 ) which is positive on si and is 
zero elsewhere. Let /,(x,) and / 2 (x 2 ) denote, respectively,, the 
marginal probability density functions of X t and X 2 - Take ^4 1 to be 
the set A\~ {(jc,, x 2 ) : x t = x f u —od<x 2 < 00}, where x\ is such 
that P(A t ) = Pr(A", = =f x {x\) > 0, and take A 2 to be the set 
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A 2 ：= {(jC|, x 2 ) : — qo < jc, < oo, x 2 = jCj}. Then, by definition, 
conditional probability of the event A 2 , given the event A h is 


the 




P(A t n A 2 ) Pr (A^i = x{, X 2 = X 2 ) f{x \, 




Pr(^, = x；) 


/.W) 


That is, if (jc,, jc 2 ) is any point at which /(x,) > 0, the conditional 
probability that X 2 = x 2 , given that X t = x ]f is^i-. x i)lf\{ x \)- With x, 
held fast, and with/, (jc, ) > 0 , this function of x 2 satisfies the conditions 
of being a p.d.f. of a discrete type of random variable X 2 because 
Kx\,x 2 )lf\{x x ) is nonnegative and 


I 

尤 2 


AXi,X 2 ) ^ 1 




X2 


/iOO 


We now define the symbol ^||(jc 2 jjc,) by the relation 

/2|l(_l)= ， :’ . 々 ) ，/l ⑹ >0 ， 

and we call / 2 |i(x 2 |X|) the conditional p.d.f. of the discrete type of 
random variable X 2 , given that the discrete type of random variable 
= x t . In a similar manner we define the symbol /,| 2 (x,|jc 2 ) by the 
relation 


/||2(咕2) = 々;:) 2 ) ， ,2 ⑹ > 0 ， 

and we call the conditional p.d.f. of the discrete type of 

random variable X t , given that the discrete type of random variable 

^2 = x i- 

Now let and X 2 denote random variables of the continuous type 
that have the joint p.d.f, _/( 戈 1 ，戈 2 ) and the marginal probability density 
functions/|(x,) and/ 2 (jc 2 ), respectively. We shall use the results of the 
preceding paragraph to motivate a definition of a conditional p.d.f. of 
a continuous type of random variable. When f\(x\) > 0, we define the 
symbol 1 (jjc 2 |x,) by the relation 


/211 ㈤义 1) 


AXj,X 2 ) 


In this relation, jc, is to be thought of as having a fixed (but any fixed) 



value for which f\(x t ) > 0. It is evident that f 2 \\{x 2 \x x ) is nonnegative 
and that 


/2|l ㈤ 太 I) 办 2 


f{X\,X 2 ) 

/l ⑹ 




f(xi,x 2 )dx 2 


Mx,) 


/ 1 ⑹ 


f 2 (x 2 ) > 0, 


That is,/ 2 |,(x 2 |X|) has the properties of a p.d.f. of one continuous type 
of random variable. It is called the conditional p.d.f. of the continuous 
type of random variable X 2 , given that the continuous type of random 
variable X x has the value jc, . When/ 2 (x 2 ) > 0, the conditional p.d.f. of 
the continuous type of random variable X t , given that the continuous 
type of random variable X 2 has the value jc 2 , is defined by 

/n2(^ik 2 ) = ^ 2 { x 2 ) ，你 2 ) > a 

Since each of / 2M (x 2 |jC|) and /i |2 (xil 文 2 ) is a p.d.f. of one random 
variable (whether of the discrete or the continuous type), each has all 
the properties of such a p.d.f. Thus we can compute probabilities and 
mathematical expectations. If the random variables are of the 
continuous type, the probability 

广 b 

Pr (a < X 2 < b\X } = X\) = fi\\{^i\x\) dx 2 

is called “the conditional probability that a < X 2 < b, given that 
X ] = X] . v If there is no ambiguity, this may be written in the 
form Ft (a < X 2 < 糾 : >C|). Similarly, the conditional probability that 
c < X x <d, given X 2 = x 2f is 

Pr (c < X\ < d\X 2 = ^ 2 ) = f /1|2("^|| 文2 ) 办 1 • 


If u(X 2 ) is a function of X 2 , the expectation 




«(^)/ 2 ||(^ 2 ^|) dx 2 


is called the conditional expectation of u(X 2 ), given that X x = jc, • 
In particular, if they do exist, then is the mean and 
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— £(A r 2 |:t|)] 2 | 1 x l } is the variance of the conditional distribution of 
X 2 , given X x = x Xt which can be written more simply as var (^ 2 |^,). It 
is convenient to refer to these as the “conditional mean” and the 
“conditional variance” of X 2 , given = x^. Of course, we have 

var (X 2 \ Xl ) = E(Xl\ X] ) - [£(Jr 2 |x,)] 2 

from an earlier result. In like manner, the conditional expectation of 
u(Xi\ given X 2 = x 2i is given by 

*ao 

EiuiX^M = uix^ix^) dx x . 

go 

With random variables of the discrete type, these conditional 
probabilities and conditional expectations are computed by using 
summation instead of integration. An illustrative example follows. 
Example 1. Let and X 2 have the joint p.d.f. 

Ax\,x 2 ) = 2, 0< jc, < jc 2 < 1, 

= 0 elsewhere. 

Then the marginal probability density functions are, respectively, 

广 ■ 

~ 2 dx2 = 2(1 一 X|), 0 < Xj < 1, 

人 i 


and 


= 0 elsewhere, 

= 2 dx\ = 2 又 2, 0 < x〗< 1 ， 

Jo 

= 0 elsewhere. 


The conditional p.d.f. of X u given X 2 = x 2 , 0 < x 2 < 1, is 

2 1 

/||2( 文 _| 文 2) = 2 ^ = — , 0 < X, < x 2 , 

= 0 elsewhere. 

Here the conditional mean and conditional variance of X\ , given X 2 = x 2 , are, 
respectively, 

fOO 

五 CJT||_X2) = •^l/I|2( JC ll JC 2) 



dx x 


Xi 

y 


0 < x 2 < 1, 



86 


Muhivariate DistrUuaioiis [Ch. 2 


and 


var(X,|x 2 )= 



12 , 


0 < jc 2 < 1 . 


Finally, we shall compare the vali^s of 

Pr (0 < JIT, < \\X 2 = 1) and Pr (0 < X ] < i). 

We have 

广 1/2 广 1/2 

Pr (0 < = j) = = ①办 I = f ， 

Jo ^0 

but 

/ * 1/2 广 1/2 

Pr (0 < JT, < ^) = fiMdx, = 2(1 - x x )dx x =|. 

Jo ^0 

Since E{X^\x\) is a function of x u then E{X 2 \X S ) is a random 
variable with its own distribution, mean, and variance. Let us consider 
the following illustration of this. 

Example 2. Let X t and X 2 have the joint p.d.f. 

/ix t , x 2 ) = 6 x 2 , 0 < x 2 < x, < 1, 

= 0 elsewhere. 

Then the marginal p.d.f. of X x is 




6x 2 dx 2 = 3x^, 0 < X, < 1 , 


zero elsewhere. The conditional p.d.f. of X 2 , given = x,, is 

6x 2 2x 2 

/2|l(^2l^l) = 0 <X 2 < X,, 

zero elsewhere, where 0 < x, < 1 . The conditional mean of X 2 , given X\ = x,, 
is 


E(X 2 \ Xl ) 



dx 2 =-zx u 0 < x, < 1 . 


Now E{X 2 \X x ) = is a random variable, say Y. The distribution function 
of y = 2JT,/3 is 

G(^) = Pr(r<^) = Pr^ l <^\ 0<^<|. 


From the p.d.f. /i(x,), we have 

严 3W2 


Zx\ dxi 


0<y<y 
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Of course, G(y) = 0, if 少 < 0, and G(y) = 1, if 5 < The p.d.f., mean, and 
variance of y = 2^,/3 are 

,, 81/ - 2 
gCy) = 0 <y <^, 

zero elsewhere, 

广 2/3 

E{Y)= y 
•^0 



and 


var(y) 


广 2/3 / 

0 \ 


81 〆 、 

T, 


dy — 


4 = 60- 


Since the marginal p.d.f. of X 2 is 

»1 

fi(x 2 ) = 6x 2 dx y = 6x 2 ( 1 - x 2 ), 0 < x 2 < 1, 

zero elsewhere, it is easy to show that E(X 2 ) = |and var (A^) = That is, here 

E{Y) = E[E{X 2 \X x )} = E(X 2 ) 
and 

var (y) = var [f^lJT,)] < var (X 2 y 

Example 2 is excellent, as it provides us with the opportunity to 
apply many of these new definitions as well as review the distribution 
function technique for finding the distribution of a function of a ran¬ 
dom variable, namely Y = 2XJ3, Moreover, the two observations at 
the end of Example 2 are no accident because it is true, in general, that 

E[E(X 2 \X ] )] = E(X 2 ) and var [E(X 2 \X ] )] < var (X 2 ). 

To prove these two facts, we must first comment on the expectation 
of a function of two random variables, say u(X y , X 2 ). We do this for 
the continuous case, but the argument holds in the discrete case with 
summations replacing integrals. Of course, Y = u(X it X 2 ) is a random 
variable and has a p.d.f., say g(y), and 

(%QO 

E{Y) = yg(y) dy. 

—OD 

However, as before, it can be proved (Section 4.7) that E(Y) equals 

• 00 / *oo 

E[u{Xy , X 2 )] = u(x u x 2 )f(x u x 2 ) dx } dx 2 . 
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We call E[u{X u X 2 )] the expectation (mathematical expectation or 
expected value) of u(X ly X 2 ), and it can be shown to be a linear 
operator as in the one-variable case. We also note that the expected 
value of X 2 can be found in two ways: 


E{X 2 ) 


x 1 f{x i ,x 1 )dx x dx 2 


文 2 / 2 ⑹ dx 2l 


the latter single integral being obtained from the double integral by 
integrating on x x first. 

Example 3. Let X K and X 1 have the p.d.f. 

J{x u x 2 ) = 8x,x 2 , 0 < X, < x 2 < 1, 

= 0 elsewhere. 

Then 


E(X t X\) = 


x l xy{x l ,x 2 )dx l dx 2 




8 j^jc 2 dx, dx 2 


dx 2 = 


In addition. 


E(X 2 ) 


r x 2 


x 2 ( 8 X[X 2 ) dx^ dx 2 = 


Since X 2 has the p.d.f. / 2 (x 2 ) = 4x\, 0 <x 2 < 1, zero elsewhere, the latter 
expectation can be found by 


E{X 2 ) 


x 2 ( 4 x 2 ) dx 2 


5- 


Finally, 


E(1X^X\-^ 5X 2 ) = 1E{X { X\) + 5£(D 
= ⑺ ( 丟 ） + (5)(!) = 孕 . 


We begin the proof of £[£(^ 2 |^,)] = E(X 2 ) and var [E{X 2 \X K )] < 
var (^ 2 ) by noting that 


E(X 2 ) 


x 2 Ax\, X 2 ) dx 2 dx l 
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Ax u x 2 ) 
1 AM 


dx 2 f\ 


⑹心 I 






= E[E(X 2 \Xy)l 

which is the first result. Consider next, with /i 2 = E{X 2 ), 
var (X 2 ) = E[(X 2 - /i 2 ) 2 ] 

= 五 {[A — E{X 2 \X,) 4 - E{X 2 \X x ) - /i 2 ] 2 } 

= 蚵比一砑別不)] 2 } + E{[E{X 2 \X y ) - /i 2 ] 2 } 
+ 2E{[X 2 - £(义|尤)][£(1 2 | 义） 一 /i 2 ]}. 


We shall show that the last term of the right-hand member of the 
immediately preceding equation is zero. It is equal to 



>2 - - fi 1 \f{x u x 2 )dx 2 dx, 



[E(X 2 M - fi 2 ) 



[x 2 - W 2 \x { )] 


AM 


dxi^ftix^dxi. 


But E(Xi\x^) is the conditional mean of X 2 , given X\ = x,. Since the 
expression in the inner braces is equal to 

E(X 2 \x x ) - E(X 2 \x t ) = 0, 

the double integral is equal to zero. Accordingly, we have 

var (X 2 ) = E{[X 2 - E(X 2 \X l )Y} + E{[E{X 2 \X K ) - " 2 ] 2 }. 

The first term in the right-hand member of this equation is nonnegative 
because it is the expected value of a nonnegative function, namely 
[Xi — E{Xi\X x )f. Since E[E{Xi\X y )\ = /i 2 , the second term will be the 
var [£(^ 2 |^|)]. Hence we have 

var (X 2 ) > var [E(X 2 \X x )l 

which completes the proof. 

Intuitively, this result could have this useful interpretation. Both 
the random variables X 2 and E{X 2 \X\) have the same mean /i 2 . If we 
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did not know /i 2 , we could use either of the two random variables to 
guess at the unknown /i 2 . Since, however, var (X 2 ) > var we 

would put more reliance in E{X 1 \X x ) as a guess. That is, if we 
observe the pair (X u X 2 ) to be (xi, x 2 ), we would prefer to use 
to x 2 as a guess at the unknown /i 2 . When studying the use of sufficient 
statistics in estimation in Chapter 7, we make use of this famous result, 
attributed to C. R. Rao and David Blackwell. 


EXERCISES 

2.11. Let A", and X 2 have the joint p.d.f. J[x { , x 2 ) = x ，+ x 2 , 0 < x, < 1, 
0 < jc 2 < 1, zero elsewhere. Find the conditional mean and variance of X 2 , 
given = jc,, 0 < X| < 1. 

2.12. Let /,| 2 (jc, |jc 2 ) = c,jC| /x^, 0 < x, < x 2 , 0 < < 1, zero elsewhere, and 

/ 2 (x 2 ) = c 2 x\, 0 < < 1, zero elsewhere, denote, respectively, the 

conditional p.d.f. of X y , given X 2 = x 2 , and the marginal p.d.f. of X 2 - 
Determine: 

(a) The constants c, and c 2 . 

(b) The joint p.d.f. of X y and X 2 . 

(c) Pr G < A < 狀 =I). 

(d) Pr (i < JT, < 1). 

2.13. Let J{x u x 2 ) = 21jc^jc], 0 < x, < < 1, zero elsewhere, be the joint 

p.d.f. of X\ and AV 

(a) Find the conditional mean and variance of X t , given X 2 = x 2 , 
0 < x 2 < 1. 

(b) Find the distribution of F = E{X\\X 2 ). 

(c) Determine £( Y) and var ( Y) and compare these to E(X t ) and var (A",), 
respectively. 

2.14. If X x and X 2 are random variables of the discrete type having 
p.d.f. J[xi, x 2 ) = (x, 4- 2 x 2 )/18, (at,, x 2 ) = (1, 1), (1,2), (2, 1), (2, 2), zero 
elsewhere, determine the conditional mean and variance of X 2 , given 

= x ly for x t = 1 or 2. Also compute £(3A"_ — 2X 2 ). 

2.15. Five cards are drawn at random and without replacement from a bridge 
deck. Let the random variables X,, X 2 , and X 3 denote, respectively, the 
number of spades, the number of hearts, and the number of diamonds that 
appear among the five cards. 

(a) Determine the joint p.d.f. of X,, X 2 , and X 3 . 

(b) Find the marginal probability density functions of X t , X 2 , and X 3 .' 

(c) What is the joint conditional p.d.f. of X 1 and given that X, = 3? 
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2.16. Let X t and X 2 have the joint p.d.f.y^x,, x 2 ) described as follows: 


(文 l ， A) 

(0,0) (0, 1) (1,0) 

(i,i) 

(2,0) (2,1) 

Ax\,x 2 ) 

1 3 4 

T« 18 T8 

3 

71 

6 1 

TI 78 


andy(x|, jc 2 ) is equal to zero elsewhere. Find the two marginal probability 
density functions and the two conditional means. 

Hint: Write the probabilities in a rectangular array. 

2.17. Let us choose at random a point from the interval (0, 1) and let the 
random variable A", be equal to the number which corresponds to that point. 
Then choose a point at random from the interval (0, jc,), where x { is the 
experimental value of and let the random variable X 2 be equal to the 
number which corresponds to this point. 

(a) Make assumptions about the marginal p.d.f./i(jC|), and the conditional 

(b) . Compute Pr (X { X 2 > 1). 

(c) Find the conditional mean E(X { |x 2 ). 

2.18. Let J{x) and F{x) denote, respectively, the p.d.f. and the distribution 

function of the random variable X. The conditional p.d.f. of X, given 
X > Af 0 , jc 0 a fixed number, is defined by J{x\X > jc 0 ) — f(x 0 )], 

jc 0 < jc, zero elsewhere. This kind of conditional p.d.f. finds application in 
a problem of time until death, given survival until time jc 0 . 

(a) Show that J{x\X > jc 0 ) is a p.d.f. 

(b) Let /(x) = e~ x , 0 < x < ao, and zero elsewhere. Compute 
Pr(Jir>2|Jir> 1). 

2.19. Let X and Y have the joint p.d.f. J{x, y) = 6(1 — jc — _v), 0 < x, 0 < 3 ^, 
jc+_v< 1，and zero elsewhere. Compute Pr (21+3y< 1) and 
EiXY+lX 2 ). 

2.3 The Correlation Coefficient 

Because the result that we obtain in this section is more familiar in 
terms of X and Y, we use X and Y rather than and X 2 as symbols 
for our two random variables. Let X and y have joint p.d.f. J[x, y). If 
u{x, >>) is a function of x and 少 ， then E[u(X, Y)] was defined, subject to 
its existence, in Section 2.2. The existence of all mathematical 
expectations will be assumed in this discussion. The jneans of X and 
Y, say /i, and /i 2 , are obtained by taking u(x, y) to be x and y, 
respectively; and the variances of X and Y, say erf and a\, are 
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obtained by setting the function u(x, y) equal to(jc — ^ t ) 2 and(y — 只 2 ) 2 , 
respectively. Consider the mathematical expectation 

E[(X -/i,)(r-/i 2 )] = E(XY-h 2 X ~^Y + n lt i 2 ) 

= E(XY) - pl 2 E{X) - n x E{Y) + 

= E{XY ) -叫 fi 2 . 

This number is called the covariance of X and Y and is often denoted 
by cov (X, Y). If each of < 7 , and <r 2 is positive, the number 

E[(X - ^)(Y - n 2 )] cov (X, Y) 

P = - = - 

<7|<7 2 0\<^2 

is called the correlation coefficient of X and Y. If the standard deviations 
are positive, the correlation coefficient of any two random variables is 
defined to be the covariance of the two random variables divided by 
the product of the standard deviations of the two random variables. 
It should be noted that the expected value of the product of two random 
variables is equal to the product of their expectations plus their 
covariance; that is, E{XY) = pt x pL 2 + pcr,< 7 2 = pL x pL 2 + cov (X, Y). 

Example 1. Let the random variables X and Y have the joint p.d.f. 

f(x, y) = x ^ y, 0<x < 1, 0< 少 <1 ， 

= 0 elsewhere. 


We shall compute the correlation coefficient of X and Y. When only two 
variables are under consideration, we shall denote the correlation coefficient 
by p. Now 


Mi = ^(^0 


x(x + y)dxdy = -^ 


and 


o] = E(X l ) - fi] 


x 2 (x + y) dx dy — 


o v 


(3 


11 

= T44- 


Similarly, 




12 


and 


a\ = £( K 2 ) - n] = Y44- 


The covariance of X and Y is 


£ (尤 >0 — 川沁 


xy(x + y) dx dy — 


㈤ 


144' 
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Accordingly, the correlation coefficient of X and Y is 

_ —144 _ __1_ 

11 


Remark. For certain kinds of distributions of two random variables, say 
X and Y, the correlation coefficient p proves to be a very useful characteristic 
of the distribution. Unfortunately, the formal definition of p does not reveal 
this fact. At this time we make some observations about p, some of which will 
be explored more fully at a later stage. It will soon be seen that if a joint 
distribution of two variables has a correlation coefficient (that is, if both of 
the variances are positive), then p satisfies — l<p^l.Ifp=l, there is a line 
with equation y = a + bx f b > 0, the graph of which contains all of the 
probability of the distribution of X and Y. In this extreme case, we have 
Pr (y = a + bX) = 1. If p = — 1, we have the same state of affairs except that 
b <0. This suggests the following interesting question: When p does not have 
one of its extreme values, is there a line in the jc 少 -plane such that the 
probability for X and Y tends to be concentrated in a band about this line? 
Under certain restrictive conditions this is in fact the case, and under those 
conditions we can look upon p as a measure of the intensity of the 
concentration of the probability for X and Y about that line. 


Next, let f(x,y) denote the joint p.d.f. of two random variables X 
and Y and let /,(jc) denote the marginal p.d.f. of X. The conditional 
p.d.f. of Y, given X = x, is 

Ax, y) 


f2\i(y\x) 


f\{x) 


at points where /,(jc) > 0. Then the conditional mean of Y, given 
^ = jc, is given by 


E(Y\x) 


yfuMx) dy 


yf{x, y) dy 

~W) ~ ’ 


when dealing with random variables of the continuous type. This 
conditional mean of Y, given X = x, is, of course, a function of x alone, 
say m(x). In like vein, the conditional mean of X, given F = 火 ， is a 
function of y alone, say y(>0. 

In case m(x) is a linear function of x, say m(a:) = a + bx, we say the 
conditional mean of Y is linear in x; or that Khas a linear conditional 
mean. When m(jc) = a + bx, the constants a and b have simple values 
which will now be determined. 
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It will be assumed that neither a\ nor < 7 ^, the variances of X and Y, 
is zero. From 


we have 


/ *00 


則 4 


y/[x, y) dy 
f\{x) 


a + bx, 


yAx, y) dy = (abx)f { (x). 


( 1 ) 


If both members of Equation (1) are integrated on x, it is seen that 

E(Y) = a + bE(X), 


or 


M2 = a + bfx、, (2) 

where 川 = E(X) and fx 2 = E(Y). If both members of Equation (1) are 
first multiplied by x and then integrated on x, we have 


E{XY) = aE(X) + bE{X 2 ), 


or 

/0<7,(7 2 + = d/i, + b{a] + n]\ (3) 

where pa^a 2 is the covariance of X and Y. The simultaneous solution 
of Equations (2) and (3) yields 


<T 2 j , 

a = H2 — p —— fi\ and 0 = p — . 

(Ti <T| 

That is, 

m ( jc ) = = fi 2 + p^r(x - n^) 

u \ 

is the conditional mean of F, given X = x, when the conditional mean 
of Y is linear in jc. If the conditional mean of X, given r = 少 , is linear 
in y, then that conditional mean is given by 

v(y) = E{X\y) = /i, + ^2). 

We shall next investigate the variance of a conditional distribution 
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under the assumption that the conditional mean is linear. The 
conditional variance of Y is given by 


var (y|x) 



— f 2 \i(y\x)dy 



(y ~ f^i)-p — (x - fi t ) fix, y) dy 

- ^ - (4) 

when the random variables are of the continuous type. This variance 
is nonnegative and is at most a function of x alone. If then, it is 
multiplied by /,(jc) and integrated on x, the result obtained will be 
nonnegative. This result is 


f*QO 广 00 — 

(y — 

— tr\ ^ — m — 



^ 2 ) ~ P — (x - fix) Ax, >^) dy dx 
G \ 

(y~ fh) 2 -2p^(y- /i 2 )(x-Mi) 


+ p 2 —2 ) 2 1 Ax, y) dy dx 


E[(Y - fi 2 ) 2 ] - 2pE[(X -iU.Kr-^)] 

o\-lp G -z po\ a 2 + p 2 — u\ 

a\ 

of — 2p 2 al + p 2 o| = ct|(1 — p 2 ) 之 0. 


That is, if the variance, Equation (4), is denoted by k(x 、， then 
E[k{X)] = a\{\ — p 2 ) > 0. Accordingly, p 2 < 1, or — 1 ^ p < 1. It is 
left as an exercise to prove that —1 ^ p < 1 whether the conditional 
mean is or is not linear. 

Suppose that the variance, Equation (4), is positive but not a 
function of x, that is, the variance is a constant k > 0. Now if k is 
multiplied by /,(jc) and integrated on x, the result is k, so that 
k = u\{\ — p 2 ). Thus, in this case, the variance of each conditional 
distribution of Y, given A" = jc, is al(l — p 2 ). If p = 0, the variance of 
each conditional distribution of Y, given A" == x, is a\, the variance of 
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the marginal distribution of Y. On the other hand, if p 2 is near one, 
the variance of each conditional distribution of Y, given X = x, is 
relatively small, and there is a high concentration of the probability 
for this conditional distribution near the mean E{ y|x) = /x 2 + 
p(a 2 /u { )(x - 

It should be pointed out that if the random variables X and F in 
the preceding discussion are taken to be of the discrete type, the results 
just obtained are valid. 

Example 2. Let the random variables X and Y have the linear con¬ 
ditional means E( X|jc) = 4jc + 3 and E{X\y) = ^y — 3. In accordance with the 
general formulas for the linear conditional means, we see that £(y]jc) = pi 2 if 
x = and E{X\y) = /i, \{y - fi 2 . Accordingly, in this special case, we have 
fi 2 = 4/i| + 3 and Pi = — 3 so that /x, = —7 and fi 2 = — 12. The general 

formulas for the linear conditional means also show that the product of the 
coefficients of x and 少， respectively, is equal to p 2 and that the quotient of these 
coefficients is equal to Here p 2 = 4(^) = j with p = { (not — 5 ), and 
dilo\ = 64. Thus, from the two linear conditional means, we are able to find 
the values of 川， p, and <r 2 /o- t , but not the values of a { and a 7 . 

Example 3. To illustrate how the correlation coefficient measures the 
intensity of the concentration of the probability for A" and Y about a line, let 
these random variables have a distribution that is uniform over the area 
depicted in Figure 2.1. That is, the joint p.d.f. of X and Y is 

f(x, y) = A ， -a + bx < y < a + bx, -h <x <h ， 

= 0 elsewhere. 
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We assume here that Z? ^ 0, but the argument can be modified for Z? ^ 0. It 
is easy to show that the p.d.f. of X is uniform, namely 


f\ix )= 


•a + bx j 

Aah 

^ —a + bx 


dy = Th 


—h<x<h. 


= 0 elsewhere. 


Thus the conditional p.d.f. of Y, given A" = jc, is uniform: 

fiMx) = = Y a ' ~a + bx<y<a + bx, 

= 0 elsewhere. 

The conditional mean and variance are 

E( y|jc) = bx and var (y|jc) = ^. 


From the general expressions for those characteristics we know that 
b = Py i and y = °2(! - P 2 )- 

In addition, we know that ci\ = /^/3. If we solve these three equations, we 
obtain an expression for the correlation coefficient, namely 

bh 

p =, 二 _ 

^a 1 + m 

Referring to Figure 2.1, we note: 

1. As a gets small (large), the straight line effect is more (less) intense and p 
is closer to 1 (zero). 

2. As h gets large (small), the straight line effect is more (less) intense and p 
is closer to 1 (zero). 

3. As b gets large (small), the straight line effect is more (less) intense and p 
is closer to 1 (zero). 

This section will conclude with a definition and an illustrative 
example. Let jXx^y) denote the joint p.d.f. of the two random vari¬ 
ables X and r. If E{e txX+，zY ) exists for —h x < t x < h u —h 2 < t 2 < h 2 , 
where A, and h 2 are positive, it is denoted by M(t ] , t 2 ) and is called the 
moment-generating function (m.g.f.) of the joint distribution of X and 
Y. As in the case of one random variable, the m.g.f. M(t t , t 2 ) completely 
determines the joint distribution of X and Y, and hence the marginal 
distributions of X and Y. In fact, the m.g.f. Mi(r,) of X is 



and the m.g.f. M 2 (t 2 ) of Y is 

M 2 (t 2 ) = E(e^ r ) = M(0, t 2 ). 

In addition, in the case of random variables of the continuous type, 

d k + m M^h) 


/*00 


dt k , dt? 


fOO 


x k y m e tlX + ， 2y f[x, jy) dx dy. 


so that 



•00 

/j =* /2 *» 0 • 

— 00 ^ 


/ *oo 


y)dxdy = E(X k Y m ). 


For instance, in a simplified notation which appears to be clear, 

dM(0, 0) 


E(X) 


dt, 


= E(Y) 


dM(0, 0) 
dh 


(5) 


」 — 2 沪靴 0 ) 2 

< = 取)一 " 卜一-只?， 

_2 一、 2 d 2 M(0, 0) 2 
4 = E(Y 2 ) ~fi] = - - - n\, 

… 】护釋， 0) 

E[(x- fi 2 )] = — nxn 2 , 

Oty dt 2 

and from these we can compute the correlation coefficient p. 

It is fairly obvious that the results of Equations (5) hold if X and 
Y are random variables of the discrete type. Thus the correlation 
coefficients may be computed by using the m.g.f. of the joint 
distribution if that function is readily available. An illustrative example 
follows. In this, we let = exp (w). 

Example 4. Let the continuous-type random variables X and Y have the 
joint p.d.f. 

f(x, y) = e~ y , 0 < x <>» < oo, 

= 0 elsewhere. 


The m.g.f. of this joint distribution is 

/*OD 产 OO 

h) = 


^0 


exp (/,a ： + t 2 y — y)dy dx 
l 


(1 — — / 2 )(1 — h ) 
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provided that q + r 2 < 1 and r 2 < 1. For this distribution, Equations (5) 
become 

Mi = 1. ^2 = 2, 

= 1， 0 ^ = 2 , ⑹ 
£[(^-/i 1 )(K- / i 2 )] = l. 


Verification of results of Equations ( 6 ) is left as an exercise. If, momen¬ 
tarily, we accept these results, the correlation coefficient of X and Y is 
p = 1/^/2. Furthermore, the moment-generating functions of the marginal 
distributions of X and Y are, respectively, 

你(/|， 0 ) = r > 

i — A 


M(0, t 2 ) 


t 2 < 1 . 


d -/ 2 ) 2 

These moment-generating functions are, of course, respectively, those of 
the marginal probability density functions, 


Mx) 


e~ y dy = e~ x t 0 < x < co, 


zero elsewhere, and 

、y 

fliy) = e~ y dx = ye~\ 

zero elsewhere. 


0 < y < oo. 


EXERCISES 


2.20. Let the random variables X and Y have the joint p.d.f. 

(a) fix, = I ， (x, y) = ( 0 , 0 ), ( 1 , 1 ), ( 2 , 2 ), zero elsewhere. 

(b) A x ^ J) = 5 » (x, y) = (0, 2 ), (1, 1), (2, 0), zero elsewhere. 

(c) J{x, (x, y) = ( 0 , 0 ), ( 1 , 1 ), ( 2 , 0 ), zero elsewhere. 

In each case compute the corrdation coefficient of X and Y. 

2.21. Let X and Y have the joint p.d.f. described as follows: 


(u) 

0 , 1 ) 

0,2) (1,3) (2, 1) 

( 2 , 2 ) 

(2, 3) 

Ax,y) 

2 

B 

4 3 1 

T3 T5 B 

1 

4 

T5 


and J{x, y) is equal to zero elsewhere, (a) Find the means /i, and ^ 2 , the 
variances <r\ and and the correlation coefficient p. (b) Compute 
E( Y\X = 1), £( Y\X = 2), and the line fi 2 + — /*i)- Do the points 

[k, E{Y\X = fc)], = 1, 2, lie on this line? 
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2 .22. Lety^jc, j) = 2 , 0 <^<>>, 0 <j< 1 , zero elsewhere, be the joint p.d.f. 
of X and Y. Show that the conditional means are, respectively, (1 + jc)/2, 
0 < jc < 1, and y/2, 0 <y < 1 . Show that the correlation coefficient of X 
and y is p = 5 . 

2.23. Show that the variance of the conditional distribution of y, given X = x, 
in Exercise 2.22, is (1 — x) 2 /\2, 0 < x < 1, and that the variance of the 
conditional distribution of X, given Y = y, isy 7 /l2,0<y< 1 . 

2.24. Verify the results of Equations ⑹ of this section. 

2.25. Let X and Y have the joint p.d.f. J[x, _y)= 1 ， —x <y<x, 0<x< 1, 
zero elsewhere. Show that, on the set of positive probability density, the 
graph of E{Y\x) is a straight line, whereas that of E(X\y) is not a straight 
line. 

2.26. If the correlation coefficient pof X and Y exists, show that — 1 ^ p ^ 1. 

Hint: Consider the discriminant of the nonnegative quadratic func¬ 
tion h(v) = E{[(X — /i,) + — /i 2 )] 2 }， where v is real and is not a function 

of X nor of Y. 

2.27. Let , t 2 ) = In M(t Xy t 2 ), where , i 2 ) is the m.g.f. of X and Y. 
Show that 

em o) mo, o) . , ， 

z = 1 ， 2 ， 

and 

卿， 0 ) 
dt, dt 2 

yield the means, the variances, and the covariance of the two random 
variables. Use this result to find the means, the variances, and the covariance 
of X and Y of Example 4. 

2.4 Independent Random Variables 

Let X\ and X 2 denote random variables of either the continuous or 
the discrete type which have the joint p.d.f. x 2 ) and marginal 
probability density functions /,(ac,) and / 2 (ji: 2 ), respectively. In 
accordance with the definition of the conditional p.d.f./ 2 |i(jc 2 |jC|), we 
may write the joint p.d.f./( jc,, a: 2 ) as 

f{X \, x 2 ) = 

Suppose that we have an instance where does not depend 
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upon x t . Then the marginal p.d.f. of X 2 is, for random variables of the 
continuous type, 

广 OO 

fiix 2 ) = / 2 |i(^ 2 ki)/i(^i) dx x 


=/ 2 | 1 ( 瑜 1 ) f\{x x )dx x 

* —00 

= 知 ( 义 2 | 义 1 ). 

Accordingly, 

/ 2 ⑹ =/211 (A ki) and ■，^ !) = /i ( 文 i )/ 2 ⑹， 

when / 2 |i(a ： 2 |a:,) does not depend upon x x . That is, if the conditional 
distribution of X 2 , given X x = a:,, is independent of any assumption 
about x u then^x,, x 2 ) = f\{x x )f 2 {x 2 ). These considerations motivate 
the following definition. 

Definition 2. Let the random variables X, and X 2 have the joint 
p.d.f. ^(jc,, x 2 ) and the marginal probability density functions f\(x t ) 
andy^(x 2 ), respectively. The random variables X t and Z 2 are said to be 
independent if, and only 义 2 ) = f\( x \)f 2 ( x 2 ) - Random variables 

that are not independent are said to be dependent. 

Remarks. Two comments should be made about the preceding definition. 
First, the product of two positive functions /, (x, )f 2 (x 2 ) means a function 
that is positive on a product space. That is, if and f 2 (x 2 ) are positive 
on, and only on, the respective spaces and s/ 2 , then the product of 
f\(x t ) and / 2 (jc 2 ) is positive on,. and only on, the product space 
; {(x,, jc 2 ) : e s/ lt x 2 e s / 2 }-For instance, if = {x t :0 < x, < 1 } and 
— { x 2 -0 < x 2 < 3}, then i = {(;c_, jc 2 ) : 0 < Xi < 1,0 < x 2 < 3}. The 
second remark pertains to th<^ identity. The identity in Definition 2 should be 
interpreted as follows. There may be certain points (x^ f x 2 )es/ at which 
/[x if x 2 ) ^ /i(X|)/ 2 (jc 2 ). However, if A is the set of points (x,, x 2 ) at which the 
equality does not hold, then P(A) = 0. In the subsequent theorems and the 
subsequent generalizations，a product of nonnegative functions and an 
identity should be interpreted in an analogous manner. 

Example 1. Let the joint p.d.f. of X x and X 2 be 

A x u X 2) = X ] + X 2 , 0 < X, < 1, 0 < JC 2 < 1, 

= 0 elsewhere. 1 



It will be shown that X x andX 2 are dependent. Here the marginal probability 
density functions are 




A x \^ x 7 )dx 2 = (x, + x 2 ) dx 2 = Xi + 0 <x, < 1 , 


0 elsewhere. 


Mx 2 ) 


Jixi,x 2 )dx l = (x, + x 2 ) ^ + x 2 , 0 < x 2 < l. 


= 0 elsewhere. 

Since f(x t , x 2 ) # fi(x l )f 2 (x 2 ), the random variables and X 2 ard dependent 

The following theorem makes it possible to assert, without 
computing the marginal probability density functions, that the random 
variables X x and X 2 of Example 1 are dependent. 

Theorem 1. Let the random variables and X 2 have the joint p.d.f. 
/(X|, jc 2 ). Then X' and X 2 are independent if and only iff(x { , x 2 ) can be 
written as a product of a nonnegative function of x' alone and a 
nonnegative function of x 2 alone. That is, 

f(Xi, X 2 ) = ^)^X 2 ), 

where g(xj) > 0, e s/ t , zero elsewhere, andh(x 2 ) > 0, x 2 s 'zero 
elsewhere. 

Proof. If X x and X 2 are independent, then /(x,, x 2 ) 三 /i( 义 1 )/ 2 (^)， 
where /i(aT|) and f 2 (x 2 ) are the marginal probability density functions 
of X\ and X 2 , respectively. Thus the condition f(x u x 2 ) = g(.X])h(x 2 ) 
is fulfilled. 

Conversely, if/(x l5 x 2 ) = g(xi)h(x 2 ), then, for random variables of 
the continuous type, we have 




g{x t )h(x 2 ) dx 2 = g(x t ) h(x 2 ) dx 2 = c,g(x,) 




g{x x )h(x 2 ) dx t = h(x 2 ) g(Xi) dx t = c 2 h(x 2 ). 
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where c t and c 2 are constants, not functions of jc, or x 2 . Moreover, 
c,c 2 — 1 because 


/ •00 


g(x l )h(x 2 ) dx^ dx 2 


/ *00 


gix^dxy 


/•00 


h{x 2 ) dx 2 


Cjc 卜 


These results imply that 

f(x u x 2 ) = gix^h(x 2 ) = c i g(x { )c 2 h(x 2 ) =/,(ac,)/ 2 (a: 2 ). 

Accordingly, X x and X 2 are independent. 

If we now refer to Example 1， we see that the joint p.d.f. 

/lx u x 2 ) = JC| + x 2 , 0 <Xi <\, 0 <x 2 <\, 

= 0 elsewhere, 


cannot be written as the product of a nonnegative function of x, alone 
and a nonnegative function of x 2 alone. Accordingly, X v and X 2 are 
dependent. 

Example 2. Let the p.d.f. of the random variables and X 2 be 
f(x„x 2 )= - 0 < 义 | < ;c 2 < 1, zero elsewhere. The formula 8jC|X 2 might 

suggest to some that X, and X 2 are independent. However, if we consider the 
space s/i = {(jc, , x 2 ): 0 < or, < x 2 < 1 }, we see that it is not a product space. 
This should make it clear that, in general, X } and X 2 must be dependent if the 
space of positive probability density of and X 2 is bounded by a curve that 
is neither a horizontal nor a vertical line. 

We now give a theorem that frequently simplifies the calculations 
of probabilities of events which involve independent variables. 

Theorem 2. If X' and X: me independent random variables with 
marginal probability densityfunctions f x {x x )and / 2 (x 2 ), respectively, then 

Pr {a < < b, c < X 2 < d) = Pt (a < X x < b)Pr (c < X 2 < d) 

for every a < b and c <d, where a, b, c, and d are constants. 

Proof. From the independence of X x and X 2t the joint p.d ； f. of X x 
and X 2 is f\{x x )f 2 {x 2 ). Accordingly, in the continuous case, 
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l*b 


Pr (a < < b,c < X 2 < d) 




f\{x x )f 2 {x 2 ) dx 2 dx } 


/*b 


f\MdXy 




fiixt) dx 2 


or, in the discrete case, 

Vr (a < X x <b,c < X 2 < d) 


Pr (a <X x < b) Pr (c < X 2 < d); 


Z Z /| ⑹ , 2 ⑹ 

a < x\ <b c < X2<d 


a < Xi < b 


c < X2 <d 


Pr (a <X y < b) Pr (c < X 2 < d). 


as was to be shown. 


Example 3. In Example 1, X x and X 2 were found to be dependent. There, 
in general, 

Ft (a < X f < b, c < X 2 < d) ^ Pr (a < < b) Pr (c < X 2 < d). 

For instance, 


Pr (0 < r, < 0 < JIT 2 < i)= 


pl/2 /»l/2 

(a:, + x 2 ) dx x dx 2 = I 

^0 ^0 


whereas 


and 


广 1/2 

Pr (0 < ^, < i) = (a:, + dx t = I 

Jo 

广 1/2 

Pr-(0 < < 5) = (5 + -^2) dx 2 — |- 


Not merely are calculations of some probabilities usually simpler 
when We have independent random variables, but many expectations, 
including certain moment-generating functions, have comparably 
simpler computations. The following result will prove so useful that we 
state it in the form of a theorem. 


Hieorem 3. Let the independent random variables A", andX 2 have the 
marginal probability density functions / (at,) and / 2 (jc 2 ), respectively. 
The expected value of the product of a function u(Xi) of X x alone and 
a function of X z alone is, subject to their existence, equal to 
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the product of the expected value of u{X^) and the expected value of v(X 2 )], 
that is. 


ehxmx^)] = EHx.mvix^i 

Proof. The independence of and X 2 implies that the joint p.d.f. 
of and is/, (x,)/ 2 (jc 2 ). Thus we have, by definition of expectation, 
in the continuous case, 

- > \ <> : » 

foo poo 

E[u(Xi)v(X 2 )] = u(x { )v(x 2 )f { (x l )f 2 (x 2 ) dx } dx 2 



)^i ^(^2)/2(^) dx 2 

J 



= E[u(X t )]E[v(X 2 )]; 
or, in the discrete case, 


E[u{X x )v{X 2 )\ =XE ^( x Mx 2 )Mx l )f 2 (x 2 ) 

XI X\ 


X«(x,)/,(x,) 


Z 私)/2⑹ 

L 巧 • 


=E[u{X,)]E[v{X 2 )l 

as stated in the theorem. 

Example 4. Let X and F be two independent random variables with 
means /i, and n 2 and positive variances <rf and a^, respectively. We shall show 
that the independence of X and Y implies that the correlation coefficient of 
X and Y is zero. This is true because the covariance of X and Y is equal to 

E[{X- - n 2 )\^ E{X - ^EiY - fi 2 ) = 0. 

We shall now prove a very useful theorem about 1 independent 
random variables. The proof of the theorem relies heavily upon our 
assertion that an m.g.f., when it exists, is unique and that it uniquely 

determines the distribution of probability. , ， 

- ' - , 

Theorem 4. Let X x andX 2 denote random variables that.have thejoirtt 
p.d.f. /[xt, x 2 ) and the marginal probability density functions and 
f 2 (X 2 ), respectively. Furthermore，let t 2 ) denote the m.g.f. of the 
distribution. Then X、 and X 2 are independent Jf and only if ， '' 

M{t\, t 2 ) = M{ti,0)M(0, t 2 ). 
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Proof. If X x and X 2 are independent, then 
M(t i ,t 2 ) = E(e t ' x ' + {2X2 ) 

=E){e tiXl e t2X2 ) 

；^ = E(e ， iX ')E(e t2X2 ) 

= M(f,,0)M(0, t 2 y 

Thus the independence of X v and X 2 implies that the m.g.f. of the joint 
distribution factors into the product of the moment-generating 
functions of the two marginal distributions. 

Suppose next that the m.g.f. of the joint distribution of X x and X 2 
is given by M(t x , / 2 ) = M(f|, 0)M(0, / 2 ). Now X x has the unique m.g.f. 
which, in the continuous case, is given by 

她， 0)= e ^ Mdx ,. 


Similarly, the unique m.g.f. of X 2 , in the continuous case, is given by 

f*Q 0 

M(0, / 2 ) = e t2X2 f 2 (x 2 ) dx 2 . 

J — 00 

Thus we have 

雜， 0)M(0, r 2 ) = [ f e^Mx]) dx { ir e t2X2 fi(x 2 ) dx^\ 


e hxx+tlX2 f\{x v )f 2 (x 2 ) dx y dx 2 . 


We are given that M(r,, / 2 ) = M(/,, 0)M(0, / 2 ); so 

/ »CO / «<30 

’ M(t f , t 2 ) = e f ' x ' + t2X1 Mx i )f 2 (x 2 ) dx y dx 2 . 

^ —ao ^ —co 

But M(t { , t 2 ) is the m.g.f. of X x and X 2 . Thus also 

(• co /*O0 

t 2 ) = e t,xi + t2X2 J[x l ,x 2 )dx i dx 2 . 

^ — ao ^ — co 

The uniqueness of the m.g.f. implies that the two distributions of 
probability that are described by fi(x ] )f 2 (x 2 ) and Jixi, x 2 ) are the 
same. Thus 

Ax\,x 2 )=f l (x i )f 2 (x 2 ). 

That is, if A/(r,, t 2 ) - M(t { , 0)Af(0, t 2 ), then X x and X 2 are indepen¬ 
dent. This completes the proof when the random variables are of the 
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continuous type. With random variables of the discrete type, the 
proof is made by using summation instead of integration. 

EXERCISES 

2.28. Show that the random variables and X 2 with joint p.d.f./(x,, x 2 )= 

12 x_jc 2 (1 — x 2 ), 0 < jc, < 1 , 0 < < 1 , zero elsewhere, are independent. 

2.29. If the random variables X\ and X 2 have the joint p.d.f. /(a ：], x 2 )= 
2e~ x '~ 3ri , 0 < X, < x 2 , 0 < x 2 < oo, zero elsewhere, show that A", and X 2 
are dependent. 

2.30. Let /(x,, x 2 ) = ^ ， x t = 1, 2, 3,4, and x 2 = 1, 2,3,4, zero elsewhere, 
be the joint p.d.f. of and X 2 . Show that X { and X 2 are independent. 

2.31. FindPr (0 < X { < 3 ,0 ; < X 2 < |)if th^random variables X\ and X 2 have 
the joint p.d.f. J\x { , x 2 ) = 4x,(l — x 2 ), 0 <>:, < 1, 0 < x 2 < 1, zero 
elsewhere. 

2.32. Find the probability of the union of the events a< X { < b, 
— oo<X 2 <oo and — 00 < A", < 00, c < X 2 < d if and X 2 are two 
independent variables with Pr (a < A", < 6 ) = 5 and Yv {c < X 2 < d) = \. 

2.33. If f{x \, x 2 ) = e~ Xi ~ X2 , 0 < < 00, 0 < jc 2 < 00, zero elsewhere, is the 

joint p.d.f. of the random variables X y and X 2 , show that X x and X 2 are 
independent and that M{t x , t 2 ) <= (1 — 0 -1 (1 — 6 ) -1 ，G < 1， q < 1. Also 
show that . 

£(^' + ^) = 0 t< 1. 

Accordingly, find the mean and the variance of Y = + X 2 . 

2.34. Let the random variables and have the joint p.d.f./(x,, x 2 ) = Ijn, 
(jc, — l ) 2 + (x 2 + 2 ) 2 < 1, zero elsewhere. Find/■(<■) and f 2 (x 2 ) - Are X x and 
X 2 independent? 

2.35. Let X and Y have the joint p.d.f. f(x, y) = 3jc, 0 〈少 < x < 1， zero 
elsewhere. Are X and Y independent? If not, find E{X\y). 

2.36. Suppose that a man leaves for work between 8:00 a.m. and 8:30 a.m. 
and takes between 40 and 50 minutes to get to the office. Let X denote the 
time of departure and let Y denote the time of travel. If we assume that these 
random variables are independent and uniformly distributed, find the 
probability that he arrives at the office before 9:00 a.m. 

2.5 Extension to Several Random Variables 

The notions about two random variables can be extended 
immediately to n random variables. We make the following definition 
of the space of n random variables. 
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Definition 3. Consider a random' experiment with the sample 
space Let the random variable X, assign to each element 
ce 贫 one and only one real number X^c) = jc,-, i = 1,2,... ,rt, 
The space of these random variables is the set of ordered «-tuples 
si = {(x,, x 2 , ..., ^„): x { = X, (c), . •. ， jc„ = X„(c), ce^j. Further¬ 
more, let A be a subset of s/. Then Pr [(AVv ... ,X n )^A] = P(C), where 
C={c:ce^and [X^c), X 2 (c ),. • ■ ， X n (c)] € A}. 

Again we should make the comment that Pr [d，..., X„) e A] 

could be denoted by the probability set function P Xi .But, if 

there is no chance of misunderstanding, it will be written simply as 
P(A). We say that the n random variables X u X 2 ,..., X n are of the 
discrete type or of the continubus type, an<l have a distribution of that 
type, according as the probability set function P(A), A cz s/, can be 
expressed as 

P(A) = Pr 队， ...,X n )sA] = S./ E/(〜.■ •，4 

or as 

P{A) = Pr [(X if … ， A；) e J] = j* … J*/(x,. xj dx t ■ - dx n . 

In accordance with the convention of extending the definition of a 
p.d.f.，it is seen that a point function f essentially satisfies the conditions 
of being a p.d.f. if (a) /is defined and is nonnegative for all real values 
of its argument(s) and if (b) its integral [for the continuous type of 
random variable(s)]，or its sum [for the discrete type of random 
variable(s)] over all real values of its argument(s) is 1. 

The distribution function of then random variables X { , X 2 ,... ,X n 
is the point function 

An illustrative example follows. 

» '• ■ 

Example 1. Let f(x^ y, z) ~ e~ (x + y + z \0 < x^y,z < oo, zero elsewhere, be 
the p.d.f. of the random variables X, Y, and Z. Then the distribution function 
of X, Y, and Z is given by 

F{x, y, z) = Pr {X ^x,Y^y f Z ^ z), 

e~ u ~ v ~ w dudv dw 
J 0 



= (1-^)(1-0(1-0, 


0 <. x,y,z < oo. 
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and is «}ual to zero elsewhere. Incidentally, except for a set of probability 
measure zero, we have ■ •' 

d^Flx, y, z) 

Let X\^X 2 ,... ,X n be random variables having joint p.d.f. 
/(jc,, jc 2 , ■ •. ， jc”) and let u{X u X 2 ,..., X n ) be a function of these 
variables such that the n-fold integral 


々， _•• ，〜 )/( 〜， x„) dx x dx 2 ■- dx„ (1) 


exists, if the random variables are of the continuous type, or such that 
the n-fold sum 


E. •. 卜挪 , ’u 


( 2 ) 


Xn 


exists if the random variables are of the discrete type. The n-fold 
integral (or the n-fold sum, as the case may be) is called the expectation, 
denoted by E[u(X Xy X 2 ,..., 尤 )]， of the function u(X if X 2 f •. • ， I"). In 
Section 4.7 we show this expectation to be equal to E(Y), where 
Y = u(X u X 2 ,, X„). Of course, £ is a linear operator. 

We shall now discuss the notions of marginal and conditional 
probability density functions from the point of view of n random 
variables. All of the preceding definitions can be directly generalized 
to the case of n variables in the following manner. Let the random 
variables Xi,..., X„ have the joint p.d.f./( jc,, x 2 , …， jc„). If the 
random variables are of the continuous type, then by an argument 
similar to the two-variable case, we have for every a < b ， 

rb 


Pr {a <X { < b) 


f\{x^dx x 


where/ i(j:,) is defined by the (rt — l)-fold integral 


/ *00 


/i(^i) 


poo 


fix I, X 2 )... . ，.i") dx2 


Therefore,/,( jc,) is the p.d.f. of the one random variable X, and /i(jc,) 
is called the marginal p.d.f. of X,. The marginal probability density 
functions f 2 (x 2 ),... of 石， ... ， X„, respectively, are similar 

(n — l)-fold integrals. 

Up to this point, each marginal p.d.f. has been a p.d.f. of one 
random variable. It is convenient to extend this terminology to joint 
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probability density functions, which we shall do now. Here let 
/(x,, x 2 ,, x„) be the joint p.d.f. of the n random variables 
X 2 ,..., X„, just as before. Now, however, let us take any group of 
k < n of these random variables and let us find the joint p.d.f. of 
them. This joint p.d.f. is called the marginal p.d.f. of this particular 
group of k variables. To fix the ideas, take w = 6,fc = 3, and let us select 
the group X 2 ,X 4y X 5 . Then the marginal p.d.f. of X 2 , X 4 , X 5 is the joint 
p.d.f. of this particular group of three variables, namely. 




f(xi ， x 2 , x 3 , x s , x 6 ) dx' dx 3 


dx 6 . 



if the random variables are of the continuous type. 

Next we extend the definition of a conditional p.d.f. If/i(x,) > 0, 
the symbol f 2 . n |[(jc 2 ,..., x"|x 1 ) is defined by the relation 


fl ...., n|I (-^2 1 • • • » |X|) 


/(x_, x 2 , ■ ,., x„) 


and f 2 .. . ， „|i(x 2 ,..., xjxi) is called the joint conditional p.d.f. of 
X 2 , ’■'•. ， X ni given X x = x t . The joint conditional p.d.f. of any n — 1 
random variables, say X u ..., Jf,- + ],..., X„, given X, — x h is 
defined as the joint p.d.f. of X Xl X 2 ,..., X„ divided by the marginal 
p.d.f. /(x,), provided that /(x,) > 0. More generally, the joint 
conditional p.d.f. ofn — k of the random variables, for given values of 
the remaining k variables, is defined as the joint p.d.f. of the rt variables 
divided by the marginal p.d.f. of the particular group of k variables, 
provided that the latter p.d.f. is positive. We remark that there are 
many other conditional probability density functions; for instance, see 
Exercise 2.18. 

Because a conditional p.d.f. is a p.d.f. of a certain number of 
random variables, the expectation of a function of these random 
variables has been defined. To emphasize the fact that a conditional 
p.d.f. is under consideration，such expectations are called con¬ 
ditional expectations. For instance, the conditional expectation of 
u(X 2 , ..., A"”) given = x,, is, for random variables of the continuous 
type, given by 


E[u(X z ,..., = 





x / 2 … ， x„\x t )dx 7 - - - dx„. 
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provided /,(a ： i) > 0 and the integral converges (absolutely). If the 
random variables are of the discrete type, conditional expectations are, 
of course, computed by using sums instead of integrals. 

Let the random variables X u X 2l ... ,X„ have the joint p.d.f. 
/(jc,, jc 2 , ..., x„) and the margnal probability density functions 
/|( 义 |) ， / 2 ( 文 2 )， ... respectively. The definition of the indepen¬ 

dence of X\ and X 2 is generalized to the mutual independence 
of X x ,X 2 ,... ,X„ as follows: The random variables X\, X 2 , ■ ■ ■ -, X n 
are said to be mutually independent if and only if 

f(x { ,x 2t =fi(x ] )f 2 (x 2 ) - - 

It follows immediately from this definition of the mutual independence 
of X { ,X 2 , that 

Pr(a, <X y <bi,a 2 < X 2 < b 2 , . . .,a n <X„<b„) • 

- Pr (a, < X,< 6,)Pr(a 2 < X 2 < b 2 ) •- - Pr (a n <X n <b n ) 

=n Pr ( a > < x > < 

I * I 

n / i - - 

where the symbol Yl is defined to be 

/= I ' 

n <^>(0 = 中⑴妒⑵ … (p(n). 

i =* I 

The theorem that 

E\u{X x )v{X 2 )\ = EUX^EUX,)] 

for independent random variables X\ and X 2 becomes, for mutually 
independent random variables X u X 2 ,. .., X„, 

E[u y (X,)u 2 (X 2 ) - - - u n {X n )] = E[u { {X x )]E[u 2 {X 2 )] - - - E[u n {X n )l 
or —一 

=fl Elu^}. 

i= I 

The moment-generating function of the joint distribution of n 
random variables X,, X 2 ,X n is defined as follows. Let 

£"[exp it\^\ + h ^7 + * • * + 

exist for — A, <t t < h h i = 1, 2,...., n, where each A, is positive. This 
expectation is denoted t(y M{t u t 2 ,..., t„) and it is called the m.g.f. 
of the joint distribution of X u ... ,X„ (or simply the m.g.f. of 
X,,, X„). As in the cases of one and two variables, this m.g.f. 
is unique and uniquely determines the joint distribution of the n 


E 


n 响) 
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variables (and hence all marginal distributions). For example, the 
m.g.f. of the marginal distribution of X { is A/(0,..., 0, r„ 0,..., 0), 
i = 1, 2,..,,that of the marginal distribution of X, and A} is 
Af(0,..., 0, //, 0,..., 0, tj, 0,..., 0); and so on. Theorem 4 of this 
chapter can be generalized, and the factorization 

M{t\, tj,..., t n ) = ]^[ \A/(0,..., 0, f/, 0,..., 0) 

/-I 

is a necessary and sufficient condition for the mutual independence of 
X„X 2 ,...,X n . 

Remark. If X 2t and X 3 are mutually independent, they are pairwise 
independent (that is, X, and X } , i ^ j, where ij = 1,2, 3, are independent). 
However, the following example, due to S. Bernstein, shows that pairwise 
independence does not necessarily imply mutual independence. Let X\, X 2 , 
and X 3 have the joint p.d.f. 

J{x x , x 2 , x 3 ) = i ， (jc,, x 2t x 3 ) s {(1, 0, 0), (0,1, 0), (0, 0, 1), (1, 1, 1)}, 

= 0 elsewhere. 

The joint p.d.f. of X t and X jt i # j, is 

Mx h Xj) = (^,^)6 {(0,0), (1,0), (0, 1),(1,])}, 

= 0 elsewhere, 
whereas the marginal p.d.f. of X, is 

~ 5> X i = 0 ， I ， 

= 0 elsewhere. 

- ! 

Obviously, if i / j, we have 

and thus X, and Xj are independent. However, 

Xx,,jr 2 ,x 3 ) ^ f i (x l )f 2 (x 2 )Mx i ). 

Thus X Vy and X 3 are not mutually independent. 

Example 2. Let X u X 2 , and be three mutually independent random 
variables and let each have the p.d.f.y(x) = 2x, 0 < x < 1， zero elsewhere. The 
joint p.d.f. of X 2 , Xj is ^,)/(x 2)/(^0 = 8 jc,x 2 x 3 , 0 < x, < 1, i — 1,2,3, 
zero elsewhere. Then, for illustration, the expected value of 5X x X\ 4 - 
is - 

• | i*| M 

(5jC|jc^ + 3 jc 2 j4)8jc,x 2 Xj dx i dx 2 dx 3 = 2. 

Jo ^0 
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Let Y be the maximum of X u X 2 , and X 3 . Then, for instance, we have 

Pr(y<i) = Pr(^<i,^<^ 3 <i) 


/ * 1/2 


产 1/2 


产 1/2 


^x { x 2 x 3 dx } dx 2 dx 3 


^0 ^0 ^0 


=( 0 6 = s- 

In a similar manner, we find that the distribution function of Y is 
G{y) = Pr (y ^ >>) = 0, >» < 0 

=/， 0<>^< 1, 

= 1 ， 1 < y- 

Accordingly, the p.d.f. of Y is 

容 00 = 6/, o<>> < 1, 

= 0 elsewhere. 

Remark. Unless there is a possible misunderstanding between mutual and 
pairwise independence, we usually drop the modifier mutual. Accordingly, 
using this practice in Example 2, we say that X t , X 3 are independent 
random variables, meaning that they are mutually independent. Occasionally, 
for emphasis, we use mutually independent so that the reader is reminded that 
this is different from pairwise independence. 


EXERCISES 

2.37. Let X, Y, Z have joint p.d.f. J[x, y, z) = 2(x + + z)/3, 0 < x < 1, 

0 <_y<l ， 0 <z<l，zero elsewhere. 

(a) Find the marginal probability density functions. 

(b) Compute Pr (0 < A" < 5, 0 < Y<\,Q <Z <{) and Pr (0 < A"< 5)= 

Pr(0< y<|) = Pr(0<Z<^). ' 

(c) Are X, Y, and Z independent? 

(d) Calculate E(X 2 YZ + 3XY 4 Z 2 ). 

(e) Determine the distribution function of X, Y, and Z. 

(f) Find the conditional distribution of X and Y, given Z = z, and evaluate 
E(X+ Y\z). 

(g) Determine the conditional distribution of X, given Y = y and Z = z, 
and compute E{X\y, z). 

2.38. Let x 2f x } ) = exp [ —(x, + x 2 4 - jc 3 )]，0 < x, < 00 ， 0 < x 2 < 00, 

0 < jc 3 < oo, zero elsewhere, be the joint p.d.f. of X、, X 2i X 3 . 

⑻ Compute Pr < X 2 < X 3 ) and Pr (X y =» X 2 < Xj). 

(b) Determine the m.g.f. of Xi, X 2 , and X'. Are these random variables 
independent? 
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2.39. Let X i ,Xi, X 3 , and be four independent random variables, each with 
p.d.f. f(x) = 3(1 — x) 2 , 0 < x < 1, zero elsewhere. If Y the minimum of 
these four variables, find the distribution function and the p.d.f. of Y. 

2.40. A fair die is cast at random three independent times. Let the random 
variable X, be equal to the number of spots that appear on the ith trial, 

1 = 1 ， 2, 3. Let the random variable Y be equal to max (Xj). Find the 
distribution function and the p.d.f. of Y. 

Hint: Pr(y<>0 = Pr(Jir ( <y, i=l, 2, 3). 

2.41. Let t 2 , / 3 ) be the m.g.f. of the random variables X u X 2 , and 
X 3 of Bernstein’s example, described in the remark preceding Example 

2 of this section. Show that M(t t , t 2 , 0) = M(/|, 0, 0)A/(0, t 2> 0), 

•^(6, 0, f 3 ) = M{t\, 0, 0)A/(0, 0, / 3 ), A^(0, t^, / 3 ) = A/(0, ( 2 , 0)A/(0, Q, 6 )， 
but M(t t , t 2 , t 3 ) ^ 0)M(0, t 2 , 0) M(0, 0, / 3 ). Thus X y , X 2 , are 

pairwise independent but not mutually independent. 

2.42. Let X y , X 2 , and Xy be three random variables with means, variances, and 

correlation coefficients— denoted by fi tJ /i 2 , /i 3 ; ff], 6 \, and p i2 , p i3 , P 23 , 
respectively. If — x 3 ) = b 2 (x 2 — n 2 ) + 办 3 ( 文 3 — /^)， where bj and 

bi are constants, determine b 2 and fr 3 in terms of the variances and the 
correlation coefficients. 

< , ' ' ‘二. 

ADDITIONAL EXERCISES 

2.43. Find Pr [X\ X 2 < 2 ], where X, and X 2 are independent and each has the 
distribution with p.d.f. /(x) = 1, 1 < x < 2, zero elsewhere. 

J ' 

.. - 2 

2.44. Let the joint p.d.f. of X and Y be given by /(u) = ^ ^ + > r ) 3 % 

0 < x < 00 , 0 < y < co, zero elsewhere. 

(a) Compute the marginal p.d.f. of A" and the conditional p.d.f. of Y, 
given X = x. 

(b) For a fixed X = x, compute £(l + ;c + K|x) and use the result to 
compute £(y|a:). 

2.45. Let X y , X 2 , X } be independent and each have a distribution with p.d.f. 
f(x) = exp (—jc), 0 < x < . 00 , zero elsewhere. Evaluate: 

⑻ Pr (X t < X 2 \X t < 2X 2 ). 

(b) ?t(X 1 <X 2 <X 3 \X } <\). ' 

2.46. Let X and Y be random variables with space consisting of the four 
points: (0, 0), (1, 1), (1, 0), (1, — 1). Assign positive probabilities to these 
four points so that the correlation coefficient is equal to zero. Are X and 
Y independent? 
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2.47. Two line segments, each of length 2 units, are placed along the x = 
axis. The midpoint of the first is between x = 0 and x = 14 and that of the 
second is between x = 6 and x = 20. Assuming independence and uniform 
distributions for these midpoints, find the probability that the line segments 
overlap. 


2.48. Let X and Y have the joint p.d.f./(x, >>) = (x, y) = (0,0),(1,0),(0,1), 
(1,1) ， （ 2 ， 1) ， (1,2), (2,2), and zero elsewhere. Find the correlation 
coefficient p. 

2.49. Let and X 2 have the joint p.d.f. described by the following table: 


(x f ,x 2 ) 

(0,0) (0,1) (0,2) (1,1) 

(1,2) (2, 2) 

f(x t ,x 2 ) \ 

n A , W 

4 1 

n u 


Find /,(x,), f 2 (x 2 ), /i,, /i 2 , <r\, and p. 

2.50. If the discrete random variables X t and X 2 have joint p.d.f. 
f(x t , x 2 ) = (3x, + x 2 )/24, (x,,x 2 ) = (1, 1), (1, 2 ), ( 2 , 1), (2, 2 ), zero else¬ 
where, find the conditional mean £(A" 2 |jCi), when x, = 1. 

2.51. Let A" and y have the joint p.d.f. = 21 x 2 ^, 0 < x < y < 1, zero 

elsewhere. Find the conditional mean E{ Y\x) of Y, given X = x. 

2.52. Let and X 2 have the p.d.f.y^x,, x 2 ) — + x 2 ,0 < x, < 1,0 < x 2 < 1, 

zero elsewhere. Evaluate Pt(Xi/X 2 < 2). 

2«53. Cast a fair die and let A" = 0 if 1, 2 , or 3 spots appear, let 尤 =1 if 4 or 
5 spots appear, and let A" = 2 if 6 spots appear. Do this two independent 
times, obtaining X { and AV Calculate Pr ((A", — X 2 \ = 1). 


2.54. Let = a] = a 2 be the common variance of X y and X 2 and let p be the 
correlation coefficient of 火丨 and X 2 . Show that 


Pr [\(X t - + (X 2 - n 2 )\ > ka] < 


2(1 十 p) 


k 2 



CHAPTER 


Some Special 
Distributions 


3.1 The Binomial and Related Distributions 

In Chapter 1 we introduced the uniform distribution and the 
hypergeometric distribution. In this chapter we discuss some other 
important distributions of random variables frequently used in 
statistics. We begin with the binomial and related distributions. 

A Bernoulli experiment is a random experiment, the outcome of 
which can be classified in but one of two mutually exclusive and 
exhaustive ways, say, success or failure (e.g., female or male, life or 
death, nondefective or defective). A sequence of Bernoulli trials occurs 
when a Bernoulli experiment is performed several independent times 
so that the probability of success, say p, remains the same from trial 
to trial. That is, in such a sequence, we let p denote the probability of 
success on each trial. 

Let X be a random variable associated with a Bernoulli trial by 
defining it as follows: 
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^(success) = 1 and Jf(failure) = 0. 
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That is, the two outcomes, success and failure, are denoted by one and 
zero, respectively. The p.d.f. of X can be written as 

/W =/^(l -p) y \ ^ = 0, 1, 

and we say that X has a Bernoulli distribution. The expected value of 



fi = E(X)= Y, ^0 ~PY~ X = (0)(1 ~P) + (\)(P)=P ， 

Jf = 0 

and the variance of X is 

<r 2 = var(J0= X (x- p)Y{\ -p) x ~ x 

x = 0 

= P 2 i\ -/>) + (! ~P?P=Pi\ -P). 

It follows that the standard deviation of Z is <r = y/p(l — p). 

In a sequence of n Bernoulli trials, we shall let denote the 
Bernoulli random variable associated with the /th trial. An observed 
sequence of n Bernoulli trials will then be an n-tuple of zeros and ones. 
In such a sequence of Bernoulli trials, we are often interested in the total 
number of successes and not in the order of their occurrence. If we let 
the random variable X equal the number of observed successes in n 
Bernoulli trials, the possible values of X are 0, 1,2,... ,n. If x successes 
occur, where jc = 0, 1, 2, ..., n, then n — x failures occur. The number 
of ways of selecting x positions for the x successes in the n trials is 



Since the trials are independent and since the probabilities of success 
and failure on each trial are, respectively, p and \ — p, the probability 
of each of these ways is ^(1 — p)"_ 

the sum of the probabilities of these 
is, 

f(x) = (^jni -P) n ~\ 


.Thus the p.d.f. of X, say f(x), is 
> 
n 


0 


mutually exclusive events; that 


= 0 elsewhere. 


jc = 0,1, 2,..., /i, 



118 


Some Special Distributions [Ch, 3 


Recall, if n is a positive integer, that 
(a + by 


lot) 



Thus it is clear that f(x) ^ 0 and that 

e/w= i -py- 

x jr-0 W 


=K 1 ~ P) + Pf = 

That is, f(x) satisfies the conditions of being a p.d.f. of a random 
variable X of the discrete type. A random variable X that has a p.d.f. 
of the form of /(jc) is said to have a binomial distribution ，and any such 
f(x) is called a binomial p.d.f. A binomial distribution will be denoted 
by the symbol b(n ， p). The constants n and p are called the parameters 
of the binomial distribution. Thus, if we say that X is 6(5,0, we mean 
that X has the binomial p.d.f. 



x Oj 1 ， .. • ， 5, 


= 0 elsewhere. 

The m.g.f. of a binomial distribution is easily found. It is 
M(t) = £ e' x f(x) = t ~Pr~ x 

x jf = 0 W 

=#。(=)( 〆 )$ -/0 … 


=[(1 -p) + pe'Y 

for all real values of t. The mean /x and the variance a 2 o[ X may be 
computed from M{t). Since 

MA) = «[( 1-/0+ 〆]" — ■(〆 ） 

and 

M\t) = /i[(l -p)+pe , ] n - \pe') + n(n - 1)[(1 -p)+pe'] n ~ 2 (pe% 
it follows that 

fi = M\0) = np 

and 

<r 2 = M"(0) — ^ = np + n(n — \)p 2 — (np) 2 — np(l — p). 
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Example L .Let AT be the number of heads (successes) inn = 7 independent 
tosses of an unbiased coin. The p.d.f. of X is 


fix) 


cm 


2 


x = 0, 1 ， 2, … ， 7, 


= 0 elsewhere. 

Then X has the m.g.f. 

M ⑺ = (f+K ) 7 ， 

has mean = np = \, and has variance <r 2 = np(l — p) = l. Furthermore, we 
have 


Pr(0<^<l)= X f(x) 


128 ' 128 128 


and 


Pr(^=5)=y(5) 

7! 


5! 2! 


㈤ w 省 


Example 2. If the m.g.f. of a random variable X is 

M(t) = (§ + \e')\ 

then A" has a binomial distribution with n — 5 and that is, the p.d.f. of 


X\s 


Ax) 


( 9 _r，' 


•X = 0 ， 1 ，乏 ， .• • ， 5 ， 


= 0 elsewhere. 

Here // = n/> = | and 6 1 = np{\ — p) = ^. 

Example 3. If Y is 6(n, j), then Pr(7> 1) = 1 -Pr(y=0)= 1 - 
Suppose that we wish to find the smallest value of n that yields 
Pr (K > 1) > 0.80. We have 1 — (|) ff > 0.80 and 0.20 > (f)". Either by 
inspection or by use of logarithms, we see that « = 4 is the solution. That is, 
the probability of at least one success throughout n = 4 independent 
repetitions of a random experiment with probability of success /) = |is greater 
than 0.80. 

Example 4. Let the random variable Y be equal to the number of 
successes throughout n independent repetitions of a random experiment 
with probability p of success. That is, Y is b(n, p). The ratio Yjn is called the 
relative frequency of success. For every £ > 0, we have 


Pr 


Y 

\n~ P 


^ e 


Pr (I F — np\ > en) 


Pr Q 


n 




a 
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where ^ = np and a l — np{\ — p). In accordance with Chebyshev’s inequality 
with k = i^/njp{\ — p), we have 


and hence 




Now, for every fixed e > 0, the right-hand member of the preceding inequality 
is close to zero for sufficiently large n. That is, 


史 Pr U— 


)= 0 


and 

lim Pr I — — /> < € 

«-oo 、 n 

Since this is true for every fixed £ > 0, we see, in a certain sense, that the relative 
frequency of success is for large values of n, close to the probability p of 
success. This result is one form of the law of large numbers. It was alluded to 
in the initial discussion of probability in Chapter 1 and will be considered 
again, along with related concepts, in Chapter 5. 

Example 5. Let the independent random variables X\,X 2 , Xi have the 
same distribution function Let Y be the middle value of Xi, X 2 , X 3 . To 
determine the distribution function of Y, say G(>>) = Pr (y we note that 
Y <y\{ and only if at least two of the random variables , X 2 , are less 
than or equal to y. Let us say that the /th “trial” is a success if X, < y, 
i = 1, 2, 3; here each “trial” has the probability of success In this 
terminology, G(>0 = Pr (y < y) is then the probability of at least two 
successes in three independent trials. Thus 

G(y) = 剛 m 卞 fly)] + 剛 ' • 



If is a continuous type of distribution function so that the p.d.f. of X is 
F(x) = /(x), then the p.d.f. of Y is 

g(y) = G\y) = 6 [F{y)][l - F(y)]f(y). 

Example 6. Consider a sequence of independent repetitions of a random 
experiment with constant probability p of success. Let the random variable 
Y denote the total number of failures in this sequence before the rth success; 



Sec. 3.1] The Binomial and Related DistrUmlioHs 


121 


that is, 7 + r is equal to the number of trials necessary to produce exactly r 
successes. Here r is a fixed positive integer. To determine the p.d.f. of Y, let 
y be an element of { 少：少 = 0, 1，？， .. .}• Then，by the multiplication rule of 
probabilities, Pr (Y = y) = g(y) is equal to the product of the probability 



of obtaining exactly r — 1 successes in the first + r — 1 trials and the 
probability /> of a success on the (少 + r)th trial. Thus the p.d.f. of Y is 
given by 

p\\ -p) y , _y = 0 ， l ， 2 ，..，， 


犮 ( 少 ） = 




= 0 elsewhere. r 

A distribution with a p.d.f. of the form ( 少 ） is called a negative binomial 
distribution; and any such g(y) is called a negative binomial p.d.f. The 
distribution derives its name from the fact that g(y) is a general term in 
the expansion of p/\\ — (1 — p)]~ r . It is left as an exercise to show that the 
m.g.f. of this distribution is M(t) = ff[\ — (1 — p)e']~ r , for / < —In (l — p). 
If r = 1, then Y has the p.d.f. •, 

犮 00 ^ = 0, 1, 2,..., 

zero elsewhere, and the m.g.f. M{t) = p[\ — (1 — p)e']~'. In this special case, 
r = 1, we say that Y has a geometric distribution. 

The binomial distribution is generalized to the multinomial 
distribution as follows. Let a random experiment be repeated n 
independent times. On each repetition, the experiment terminates in 
but one of k mutually exclusive and exhaustive ways, say 
C,, C 2 ,..., C k . Let Pi be the probability that the outcome is an element 
of C, and let remain constant throughout the n independent 
repetitions, i = 1,2, ... ,k. Define the random variable X t to be equal 
to the number of outcomes that are elements of C„ / = 1, 2,, 
k — \. Furthermore, let a:,, x 2j ..., , be nonnegative integers so 

that a + jc 2 + • • + 々 _ 丨 < n. Then the probability that exactly 
X, terminations of the experiment are in C,,..., exactly x k _ t 
terminations are in C k _ { , and hence exactly n — (x, + • • • + x k .,) 
terminations are in C k is 


n\ 




■^1 • " " " - 1 • -^Jlr ■ 

where jc* is merely an abbreviation for n — (x, + * • • + ,). This is 
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the multinomialp.d.f. of — 1 random variables X t , X 2 , . • ■ ， — | of 
the discrete type. To see that this is correct, note that the number of 
distinguishable arrangements of jC|CVs ， x 2 C 2 , s,..., x k C k *s is 

n\ 




X|! x 2 ! ••- jc*! 


and that the probability of each of these distinguishable arrangements 
is 

P X \ i p X 2 1 '--p X k k -' 

Hence the product of these two latter expressions gives the correct 
probability, which is in agreement with the formula for the multinomial 
p.d.f. ^ 一 

When k = 3, we often let X — X x and Y = X 2 ; then 
n — X — Y = X 3 . We say that X and Y have a trinomial distribution. 
The joint p.d.f. of X and Y is 


fix, y) 


x\ yl(n — x — j^)! 




^ x — y 


where x and y are nonnegative integers with x-\- y <,n, and p { , /? 2 , 
and Pi are positive proper fractions with p t + p 2 + pj = 1; and let 
f(x, j) = 0 elsewhere. Accordingly, f(x, satisfies the conditions of 
being a joint p.d.f. of two random variables X and Y of the discrete 
type; that is, f(x, y) is nonnegative and its sum over all points (jc, y) 
at which f(x, j) is positive is equal to (/?, + /? 2 + p 3 ) n = 1. 

Ifis a positive integer and a x , a 2 , a 3 are fixed constants, we have 


n n — x 

II 

x = 0 ^ 


n\ 


o , = o x] - y y - (n - x — 少 )! 


s 

n 

I 


{n — jc)! 

0 x\ {n - x)! y\(n-x- y)\ 

n\ 


Tl\ iff "y x 




X! (n — x)! 


a*(a 2 + a 3 ) n 


=(a, +a 2 + a 3 ) n . u (1) 

Consequently, the m.g.f. of a trinomial distribution, in accordance 
with Equation (1), is given by 


n n — x 


Wi, ^) = z z 


n\ 


x = o y =o x\y\(n-x~ y)\ 


iP\e u y{pie h yifi 


(P\e t{ 4 - p 2 e l1 4 - p^)\ 
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for all real values of and t 2 . The moment-generating functions of the 
marginal distributions of X and Y are, respectively, 

卿 i ， 0) = (Pte tl +P2 + P3Y = [(1 -P\)+P\e l 'T 
and 


^(0, t 2 ) = (p t + p 2 e h + p 3 ) n = [(1 - p 2 ) + Pie h ] n - 


We see immediately, from Theorem 4, Section 2.4, that X and 

Y are dependent random variables. In addition, X is b{n,p\) and 

Y is b(n, p 2 ). Accordingly, the means and the variances of X 

and Y are, respectively, ^ = np u pi 2 = = np x {\ — p } ), and 

= np 2 (l - p 2 ). 

Consider next the conditional p.d.f. of Y, given X = x. We have 


fi\\(y\x)= 


(n - x)\ / p 2 \v p, y- x - y 

(n-x-y)\ \\ -pj \\ -pj 


j = 0, 1 ， … ， n—x. 


= 0 elsewhere. 


Thus the conditional distribution of Y, given X = x, is b[n — x, 
p 2 l(\ — Pi)]. Hence the conditional mean of Y, given X = x, is the 
linear function 



We also find that the conditional distribution of X, given K = is 
b[n — — p 2 )] and thus 

E{X\y) = {n - 

Now recall (Example 2, Section 2.3) that the square of the correlation 
coefficient, say p 2 , is equal to the product of —/> 2 /(l — p\) and 
—/?,/(! — p 2 ), the coefficients of x and y in the respective conditional 
means. Since both of these coefficients are negative (and thus p is 
negative), we have 

P =- 



In general, the m.g.f. of a multinomial distribution is given by 
M{t x , ..•，&-,) = + • .. Pk +fk y 



124 


for all real values of / 2 » • • •» 4 - 1 - Thus each one-variable marginal 
p.d.f. is binomial, each two-variable marginal p.d.f. is trinomial, and 
so on. 

EXERCISES 

3.1. If the m.g.f. of a random variable is (5 +1^) 5 , find Pr (Jf = 2 or 3). 

3.2. The m.g.f. of a random variable A" is (3 + |〆) 9 . Show that 

Pr(fi-2a <X<h + 2g)= ^ 

X = 

3.3. If X is b(n, p), show that 



3.4. Let the independent random variables X u X 2 , X 3 have the same p.d.f. 
J[x) = 3X 2 ,0 < x < 1, zero elsewhere. Find the probability that exactly two 
of these three variables exceed 

3.5. Let Y be the number of successes in n indejwndent repetitions of a 
random experiment having the probability of success p = \. If « = 3, 
compute Pr (2 ^ K )； if n = 5, compute Pr (3 < Y). 

3.6. Let r be the number of successes throughout n independent repetitions 
of a random experiment having probability of success / > = ]. Determine the 
smallest value of n so that Pr (1 ^ K) ^ 0.70. 

3.7. Let the independent random variables X' and X 2 have binomial 
distributions with parameters «! = 3 , />, = f and n 2 = 4, p 2 = respectively. 
Compute Pr (X ] = X 2 ). 

Hint: List the four mutually exclusive ways that X t = X 2 and compute 
the probability of each. 

3.8. Toss two nickels and three dimes at random. Make appropriate 
assumptions and compute the probability that there are more heads 
showing on the nickels than on the dimes. 

3.9. Let X ]y X 2 ,..., X k _ t have a multinomial distribution. 

⑻ Find the m.g.f. of X lf X 3 ,..., X lc _ ] . 

(b) What is the p.d.f. of X 2 , 

(c) Determine the conditional p.d.f. of X t , given that 

= x 2 i . .., _ 1 = oc* _ 1. 

(d) What is the conditional expectation ..., t )? 

3.10. Let JirbeA(2, p) and let y be ft(4,p). IfPr (X 1) = |, find ?r(Y> 1). 
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f(X\,X 2 ) 



■ X ] = 0， 1 ， • _ • ， X | ， 

• X | = 1，2, 3, 4, 5， 


zero elsewhere, be the joint p.d.f. of X x and X 2 . Determine; 
⑻ E(X 2 ). 

(b) u(x,) = E(X 2 \x,). 

(c) m^)i 

Compare the answers of parts (a) and (c). 


3.11. If x = r is the unique mode of a distribution that is b(n,p), show that 

(/i + l)p — 1 < r < (« + l)p. 

Hint: Determine the values of x for which the ratio fix + l)//(x) > 1. 

3.12. Let X have a binomial distribution with parameters n and p = 
Determine the smallest integer n can be such that Pr (A" ^ 1) 之 0.85. 

3.13. Let X have the p.d.f. J\x) = 0(f)*，x = 0, 1 ， 2, 3, ... ， zero elsewhere. 
Find the conditional p.d.f. of X, given that A" > 3. 

3.14. One of the numbers 1, 2,… ，6 is to be chosen by casting an unbiased 

die. Let this random experiment be repeated five independent times. Let the 
random variable be the number of terminations in the set {x : x = 1, 2, 3} 

and let the random variable X 2 be the number of terminations in the set 
{x : x = 4, 5}. Compute Pr (X, = 2, X 2 = \). 

3.15. Show that the m.g.f. of the negative binomial distribution is 
M(t) = j/\\ — (1 — p)e'\~ r . Find the mean and the variance of this 
distribution. 

Hint.ln the summation representing M(t), make use of the MacLaurin's 
series for (1 — w) -r . 

3.16. Let X x and X 2 have, a trinomial distribution. Differentiate the 
moment-generating function to show that their covariance is —np 、 p 2 . 

3.17. If a fair coin is tossed at random five independent times, find the 
conditional probability of five heads relative to the hypothesis that there 
are at least four heads. 

3.18. Let an *unbiased die be cast at random seven independent times. 
Compute the conditional probability that each side appears at least once 
relative to the hypothesis that side 1 appears exactly twice. 

3.19. Compute the measures of skewness and kurtosis of the binomial 
distribution b(n, p). 

3.20. Let ’ 
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5 

Hint: Note that E{Xi) = Y. Z x iA x \^ x i) and use the fact that 

x| ■ I JT2 * 0 

t y ( n \ir = njl. Why? 

^=0 \y / 

3.21. Three fair dice are cast. In 10 independent casts, let X be the number 
of times all three faces are alike and let Y be the number of times only two 
faces are alike. Find the joint p.d.f. of X and Y and compute E(6XY), 


3.2 The Poisson Distribution 


Recall that the series 


—3 


i + 爪 + 可 + 3T + … =S 7 


converges, for all values of m, to Consider the function f(x) defined 
by 

rrfe^ m 

f (. x ) — ~ Zi ~» a : = 0,1，2, ... ， 


= 0 elsewhere, 

where m > 0. Since m>0, then f(x) > 0 and 


i ax ) = i 令 

x jt = 0 


y nr 
^ x\ 


e e 


that is ， /(;c) satisfies the conditions of being a p.d.f. of a discrete type 
of random variable. A random variable that has a p.d.f. of the form 
f(x) is said to have a'Poisson distribution, and any such f(x) is called 
a Poisson p.d.f. 

Remarks. Experience indicates that the Poisson p.d.f. may be used in a 
number of applications with quite satisfactory results. For example, let the 
random variable X denote the number of alpha particles emitted by a 
radioactive substance that enter a prescribed region during a prescribed 
interval of time. With a suitable value of m, it is found that X may be 
assumed to have a Poisson distribution. Again let the random variable X 
denote the number of defects on a manufactured article, such as a 
refrigerator door. Upon examining many of these doors，it is found, with an 
appropriate value of m, that A" maybe said to have a Poisson distribution. The 
number of automobile accidents in some unit of time (or the number of 
insurance claims in some unit of time) is often assumed to be a random 
variable which has a Poisson distribution. Each of these instances can be 
thought of as a process that generates a number of changes (accidents. 
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claims, etc.) in a fixed interval (of time or space, etc.). If a process leads to 
a Poisson distribution, that process is called a Poisson process. Some 
assumptions that ensure a Poisson process will now be enumerated. 

Let g(jc, w) denote the probability of x changes in each interval of length 
w. Furthermore, let the symbol o(h) represent any function such that 
lim [o(h)/h] = 0; for example, h 2 = o(h) and o(h) + o(h) = o(h). The Poisson 
postulates are the following: 

1. ^(1, h) = Xh + o(h), where A is a positive constant and h > 0. 

2 . Z g(x, h) = o(h). 

Jf = 2 

3. The numbers of changes in nonoverlapping intervals are independent. 

k 〆 、 

Postulates 1 and 3 state, in effect, that the probability of one change in a 
short interval h is independent of changes in other nonoverlapping intervals 
and is approximately proportional to the length of the interval. The substance 
of postulate 2 is that the probability of two or more changes in the same short 
interval h is essentially equal to zero. If jc = 0, we take g(0, 0) = 1. In 
accordance with postulates 1 and 2 , the probability of at least one change in 
an interval of length /i is A/; + o(h) + o(h) == Xh + o(h). Hence the probability 
of zero changes in this interval of length h is \ — kh — o(h). Thus the 
probability g( 0 , iv + A) of zero changes in an interval of length w + /i is, in 
accordance with postulate 3, equal to the product of the probability g(0, tv) 
of zero changes in an interval of length w and the probability [1 — Xh — o(A)] 
of zero changes in a nonoverlapping interval of length h. That is, 

g(0, w + h) = g(0, w)[l - Xh - o(h)]. 

Then 

5 (0, w + h)- g(0, w) , /A , w) 

If we take the limit as A—0, we have 

^>*[^(0, w)] - - Ag(0, w). 

The solution of this differential equation is 

g( 0 , w) = ce~ iw . 

The condition g<0,0) = 1 implies that c = 1; so 

g( 0 , w) = e~ iw . 

If x is a positive integer, we take g(x, 0) = 0. The postulates imply that 
办 ， w + h) = U(at, w)][l - Xh- o{h)} + [g(x~ 1 , w)][Ah + o(h)] + o(h). 
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Accordingly, we have 

g(x ， w + h)~ gjx, w ) — 
h = 

and 


- ^.g(x, w) + if (x — 1, w) + 


o(h) 

~h~ 


D v [g(x, w)] = - Xg(x, w) + - 1, w), 

for x = 1 ， 2, 3, ... .It can be shown, by mathematical induction, that the 
solutions to these differential equations, with boundary conditions g(x, 0) = 0 
for x = 1 ， 2, 3, ... ， are ， respectively, 

g(x, w) = - -： - ， x=l ， 2, 3，.... 


Hence the number of changes X in an interval of length w has a Poisson 
distribution with parameter m = Xw. 

The m.g.f. of a Poisson distribution is given by 


M{t) = Y e ， x f( x ) = Z e，x 

x x = 0 


nfe~ m 


e~ m I 

;t = 0 


(me'Y 

~lcT 


== ^ ei ~ 
for all real values of t. Since 


and 


then 


and 


M\t) = e m(e ， x \me , ) 

M"{t) = e m(e， ~ ] \me r ) 4 - e m(el ~ u (me r ) 2 , 
H = M'(0) = m 


a 1 = M"(0) — fi 2 = m m 2 — m 2 = m. 

That is, a Poisson distribution has fi — a 2 = m > 0. On this account, 
a Poisson p.d.f. is frequently written 

Ax) = x = 0,1,2,..., 


= 0 elsewhere. 
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Thus the parameter m in a Poisson p.d.f. is the mean fi. Table I in 
Appendix B gives approximately the distribution for various values of 
the parameter m = fi. 


Example 1. Suppose that X has a Poisson distribution with /x = 2. Then 
the p.d.f. of X is 

Ax) = ^ = 0,1,2,..., 

= 0 elsewhere. 

The variance of this distribution is a 2 = n = 2. If we wish to compute 
Pr (1 <, X), we have 

Pr(l < 1 - Pr(Jir = 0) 

=1 —f{0) = 1 — e~ 2 = 0.865, 

approximately, by Table I of Appendix B. 

Example 2. If the m.g.f. of a random variable X is 

M(0 = e v ~ n , 

then 尤 has a Poisson distribution with n = 4. Accordingly, by way of example, 

4 3 # •一 4 

Pr(X=3)=^ r ^ye- A ; 

or, by Table I, 

Pr (X = 3) = Pr (JIT < 3) - Pr (X < 2) = 0.433 - 0.238 = 0.195. 

Example 3. Let the probability of exactly one blemish in 1 foot of wire be 
about and let the probability of two or more blemishes in that length be, 
for all practical purposes, zero. Let the random variable X be the number of 
blemishes in 3000 feet of wire. If we assume the independence of the numbers 
of blemishes in nonoverlapping intervals, then the postulates of the Poisson 
process are approximated, with A = ▲ and w = 3000. Thus X has an 
approximate Poisson distribution with mean 3000(^5) = 3. For example, the 
probability that there are exactly five blemishes in 3000 feet of wire is 

3V 3 

Pr(X=5)=^ r 

and by Table I, 

Pr (1= 5) = Pr(X^ 5) - Pr (Is 4) = 0.101 ， 
approximately. 
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EXERCISES 

3.22. If the random variable X has a Poisson distribution such that 
Pr(JT= l) = Pr(JT = 2), find Pr (X = 4). 

3.23. The m.g.f. of a random variable X is ~ u . Show that 
Pr(n-2a <X< fi + 2a) = 0.931. 

3.24. In a lengthy manuscript, it is discovered that only 13.5 percent of the 
pages contain no typing errors. If we assume that the number of errors per 
page is a random variable with a Poisson distribution, find the percentage 
of pages that have exactly one error. 

3.25. Let the p.d.f./(x) be positive on and only on the nonnegative integers. 

Given that f{x) = (4/x)/(x — 1), x = 1,2, 3.Find /(x). 

Hint: Note thatyi[l) = 4/(0),/(2) = (4 2 /2!) 只 0), and so on. That is, find 
each f(x) in terms of/(0) and then determine /(0) from 

l=/(0)+/(l)+/(2) + ..-. 

3.26. Let X have a Poisson distribution with fi = 100. Use Chebyshev’s 
inequality to determine a lower bound for Pr (75 < X < 125). 

3.27. Given that 0) = 0 and that 

D v [gix, w)] = - kg{x, w) + kg{x - 1, w) 

for jc = 1, 2, 3.If g(0, h») = e 一 〜， show, by mathematical induction, 

that 

(Xw) x e~ Aw 

g{x, w) = —-— , x = 1,2, 3, - 

3.28. Let the number of chocolate drops in a certain type of cookie have a 
Poisson distribution. We want the probability that a cookie of this type 
contains at least two chocolate drops to be greater than 0.99. Find the 
smallest value th 注 t the mean of the distribution can take. 

3.29. Compute the measures of skewness and kurtosis of the Poisson 
distribution with mean fi. 

3.30. On the average a grocer sells 3 of a certain article per week. How many 
of these should he have in stock so that the chance of his running out within 
a week will be less than 0.01? Assume a Poisson distribution. 

3.31. Let X have a Poisson distribution. If Pr = 1) = Pr (X = 3), find the 

mode of the distribution. 

3.32. Let X have a Poisson distribution with mean 1. Compute, if it exists, 
the expected value £(A1). 
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3.33. Let X and Y have the joint p.d.f. f(x, .v) = e~ 2 j[x\{y — jc)!], y = 
0,1 ， 2,…； x = = 0, 1 ， … ，少， zero elsewhere. 

(a) Find the m.g.f. Af(/,, / 2 ) of this joint distribution. 

(b) Compute the means, the variances, and the correlation coefficient of X 
and Y. 

(c) Determine the conditional mean E(X\y). 

Hint: Note that 

^ [exp (y - = [1 + exp (/■)]' 

Why? 


3.3 The Gamma and Chi-Square Distributions 

In this section we introduce the gamma and chi-square distri¬ 
butions. It is proved in books on advanced calculus that the integral 

/*O0 

y^- x e- y dy 

exists for a > 0 and that the value of the integral is a positive number. 
The integral is called the gamma function of a, and we write 

y»ao 

r(a) = y*- x e ~ y dy. 

If a = 1, clearly 

广 ao 

r(l)= e~ y dy= 1. 

4 ) 

If a > 1, an integration by parts shows that 

*00 

r ⑻ =(a - 1) y^~ 2 e-> dy = (a - l)r(a - 1). 

Accordingly, if a is a positive integer greater than 1, 

r(oc) = (a - l)(a - 2) … (3)(2)(l)r(l) = (a - 1)!. 

Since r(l) = 1， this suggests that we take 0! = 1， as we have done. 

In the integral that defines r(a), let us introduce a new variable x 
by writing y — xjfi, where /? > 0. Then 




132 


Some Specif Distribrntions |Ch. 3 


or, equivalently. 


rw 


x* _ _e _ 邱 t/x. 


Since a > 0 ,芦 > 0, and r(a) > 0, we see that 


rw 


jc* - 'e~ xlp , 0 < x < oo. 


= 0 elsewhere, 

is a p.d.f. of a random variable of the continuous type. A random 
variable X that has a p.d.f. of this form is said to have a gamma 
distribution with parameters a and and any such f(x) is called a 
gamma-type p.d.f. 

Remark. The gamma distribution is frequently the probability model for 
waiting times; for instance, in life testing, the waiting time until “death” is the 
random variable which frequently has a gamma distribution. To see this, let 
us assume the postulates of a Poisson process and let the interval of length 
h> be a time interval. Specifically, let the random variable fVbe the time that 
is needed to obtain exactly k changes (possibly deaths), where A ： is a fixed 
positive integer. Then the distribution function of W is 

G(w) = Pr(W< w) = 1 - Pt(W> w). 

However, the event W > w, for w > 0, is equivalent to the event in which there 
are less than k changes in a time interval of length w. That is, if the random 
variable X is the number of changes in an interval of length w, then 

Pt(W>w)^ X Pr(X = x) = X ^ — ■ 

It is left as an exercise to verify that 




If, momentarily, we accept this result, we have, for w > 0, 

= 1 - - ~ —dz= dz, 

i r ⑻ J 0 TO 

and for w < 0, G(w) = 0. If we change the variable of integration in the 
integral that defines (?(w) by writing z = Xy, then 

〜 r 义 w 丄 一 




dy, w > 0, 
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and G{w) = 0, w 幺 0. Accordingly, the p.d.f. of W is 

2 — ^ p 一又奴 

giw) = G\w) = —— ， 0 < w < oo, 


r(k) 

0 elsewhere. 


That is, W has a gamma distribution with a = k and = 1/A. If W is the 
waiting time until the first change, that is, if A: = 1, the p.d.f. of W is 

g(w) = ke~ iw t 0 < w < oo, 

= 0 elsewhere, 

and W is said to have an exponential distribution with mean = 1/A. 


We now find the m.g.f. of a gamma distribution. Since 


m) 


*00 

0 

广 <30 


^-\ e -m dx 


r 歐 

1 x^~ x e~ x(X ~ m dx. 


r ⑻沪 


we may set j = x(l — pt)IP，t < \/p, or x = fiyf{\ - pt), to obtain 

py 


That is, 


Now 


and 


/ »oo 


M{t) 


mp 1 vi - pt 




e~ y dy. 


fO0 


M{t) 




^0 


r(a) 


y z ~ l e~ y dy 


t<~. 


—(1 -卿， 

，(/) = ( — a )(l -/?/ 广 U) 


M\t) = (-a)(-a-l)(l- 价广 2 ( -妁 2 . 
Hence, for a gamma distribution, we have 
fi = Af’(0) = a/9 

and 

tr 2 = 3T(0) -fi 2 = a(a + \)P 2 - a 2 fi 2 = a^ 2 . 
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Example 1. Let the waiting time W have a gamma p.d.f. with a = k and 
fi = 1/A. Accordingly, E(W) = k/X. If = 1， then E{W) = 1/A; that is, the 
expected waiting time for ^ = 1 changes is equal to the reciprocal of X. 

Example 2. Let A" be a random variable such that 

(m + 3)! 

£(JT) = 3； 3", m = 1, 2, 3,.... 


Then the m.g.f. of X is given by the series 


M{t) = 1 + 


4! 3 
3M! 


t + 


5!3 2 , 6! 3 3 


r 3 + …. 


This, however, is the Maclaurin's series for (1 — 3/) -4 . provided that 
—1 < 3/ < 1 • Accordingly, A 1 has a gamma distribution with a = 4 and ^ = 3. 

Remark. The gamma distribution is not only a good model for waiting 
times, but one for many nonnegative random variables of the continuous type. 
For illustration, the distribution of certain incomes could be modeled 
satisfactorily by the gamma distribution, since the two parameters a and 办 
provide a great deal of flexibility. Several gamma probability density functions 
are depicted in Figure 3.1. 

Let us now consider the special case of the gamma distribution in 
which a = r/2, where r is a positive integer, and ^ = 2. A random 
variable X of the continuous type that has the p.d.f. 


J\ x ) = 


r(r/2)2 r/2 




= 0 elsewhere. 


0 < X < oo. 


and the m.g.f. 

’ M(t) = (1 - 2 t)^\ t<\, 

. .p 

is said to have a chi-square distribution, and any f{x) of this form is 
called a chi-square p.d.f. The mean and the variance of a chi-square 
distribution are fi = ap = (r/2)2 = r and a 2 = ap 2 — (r/2)2 2 = 2r, 
respectively. For no obvious reason, we call the parameter r the num¬ 
ber of degrees of freedom of the chi-square distribution (or of 
the chi-square p.d.f.). Because the chi-square distribution has an 
important role in statistics and occurs so frequently, we write, for. 
brevity, that X is )^(r) to mean that the random variable X has a 
chi-square distribution with r degrees of freedom. 
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Example 3. If X has the p.d.f. 

f(x) = \xe~ xfL , 0 < x < oo, 


= 0 elsewhere, 

then X is x 2 (4). Hence ju = 4, <r 2 = 8 , and M(t) = (1 — 2/) -2 , t < 5 . 

Example 4. If X has the m.g.f. M{t) = (1 — 2,)_ 8 , / < 5 , then X is z 2 (16). 


If the random variable X is x 2 ( r )-> then, with c, < c 2 , we have 


Pr {c x <X< Cl ) = Pr (X<c 2 ) -Ft (X<c { ), 

since Pr (X = c,) = 0. To compute such a probability, we need the 
value of an integral like 


Pr(Jir< jc) = 


” 1 

n r(r/2)2" 2 


w rl2 ~ l e~ w/2 dw. 


Ax) 



fix) 



FIGURE 3.1 
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Tables of this integral for selected values of r and jc have been prepared 
and are partially reproduced in Table II in Appendix B. 

Example 5. Let A" be x 2 ( 10). Then, by Table II of Appendix B, with r = 10, 
Pr (3.25 <X< 20.5) = Pr(X< 20.5) - Pr (尤 < 3.25) 

= 0.975 - 0.025 = 0.95. 

Again, by way of example, if Pr (a < = 0.05, then Pr (X ^ a) — 0.95, and 

thus a = 18.3 from Table II with r = 10. 


Example 6. Let X have a gamma distribution with a = r/2, where r is a 
positive integer, and /? > 0. Define the random variable Y = 2X/p. We seek 
the p.d.f. of Y. Now the distribution function of Y is 

G(y) = Pr(Y<y) = Pr (x < 


If 少乞 0, then G(y) = 0; but if 少 > 0, then 

cfyf 1 

G(y) = 


t/2 


Accordingly, the p.d.f. of Y is 
g ( 少）： G'(y)= 


T(r/2W 


m 

r(r/2 ) 俨 


^ a ~ y e~ xlfi dx. 


(py/iy 12 -^ 12 


r(r/2)2 r/z 

if 少 > 0. That is, Y is x\f). 


yl2 - \ e -yH 


EXERCISES 

3.34. If (1 — 2t)~ b , t <{, is the m.g.f. of the random variable X, find 

Pr (X< 5.23). 1 ^ 

3.35. If X is ^ 2 (5), determine the constants c and d so that 
Pr(c<X<d) = 0.95 and Pr (X < c) = 0.025. 

3.36. If X has a gamma distribution with a = 3 and p = 4, find 
Pr (3.28 <X< 25.2). 

Hint: Consider the probability of the equivalent event 1.64 < Y < 12.6, 
where y = 2X/4= X/2. 

3.37. Let A" be a random variable such that E(X m ) = (m+ 1)! 2 m , 
w = 1, 2, 3,... . Determine the m.g.f. and the distribution of X. 
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3.38. Show that 

t _ 1 ' 

z k ~ l e~ 2 dz = y - ， k = \,2, 3,.... 

,■0 a :! 

This demonstrates the relationship between the distribution functions of 
the gamma and Poisson distributions. 

Hint: Either integrate by parts k — 1 times or simply note that the 
“antiderivative” of z k ~ l e~ z is 



—z k ~ 'e~ 2 — (k — \)z k ~ 2 e~ z — • ■ • — (k — 1)! e~ z 
by differentiating the latter expression. 

3.39. Let X u X 2 , and be independent random variables, each with p.d.f. 
fix) = e~ x , 0 < jc < oo, zero elsewhere. Find the distribution of 
Y = minimum (X t ,X 2 , ^)- 

Hint: Pr (7 < >») = 1 - Pr (7 > ^) = 1 - Pr (X, >y,i= 1,2, 3). 

3.40. Let X have a gamma distribution with p.d.f. 


f(x) = j 2 xe-^. 


0 < x < co. 


zero elsewhere. If jc = 2 is the unique mode of the distribution, find the 
parameter P and Pr (X < 9.49). 


3.41. Compute the measures of skewness and kurtosis of a gamma distri¬ 
bution with parameters a and 


3.42. Let X have a gamma distribution with parameters a and Show that 
PT(X^2^)^{2/eY. 

Hint: Use the result of Exercise 1.115. 


3.43. Give a reasonable definition of a chi-square distribution with zero 
degrees of freedom. 

Hint: Work with the m.g.f, of a distribution that is x 2 (f) and let r = 0. 

3.44. In the Poisson postulates on page 127, let A be a nonnegative function 
of w, say A(w), such that [^(0, w)] 一 A(w)g(0, vv). Suppose that 
A(w) = krv/~ r ^ 1. 

(a) Find ^(0, w) noting that ^(0, 0)=1. 

(b) Let Wbe the time that is needed to obtain exactly one change. Then 
find the distribution function of W, namely G(iv) = Pr {W < w)= 
1 — Pr (PF > w) = 1 — g(0, w), 0 < w, and then find the p.d.f. of W. 
This p.d.f. is that of the Weihull distribution, which is used in the study 
of breaking strengths of materials. 

3.45. Let X have a Poisson distribution with parameter m. If m is an 
experimental value of a random variable having a gamma distribution with 
a = 2 and = 1 ， compute Pr (A" = 0, 1, 2). 
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3.46. Let Jfhave the uniform distribution with p.d.f. f(x) = 1 ， 0 < x < 1 ， zero 
elsewhere. Find the distribution function of F = —2 In X. What is the p.d.f. 
of r? 

3.47. Find the uniform distribution of the continuous type that has the same 
mean and the same variance as those of a chi-square distribution with 8 
degrees of freedom. 


3.4 The Normal Distribution 


Consider the integral 


exp 


_/、 


办. 


This integral exists because the integrand is a positive continuous 
function which is bounded by an integrable function; that is, 


0 < exp 


(' 


y i 


< exp (_| 少 I + 1 )， -oo <^ < oo, 


and 


exp(-b| + 1) 办 = 2e. 


To evaluate the integral /, we note that / > 0 and that/ 2 maybe written 

y 2 + z 2> 


I 2 


exp 


2 


dy dz. 


This iterated integral can be evaluated by changing to polar co¬ 
ordinates. If we set 少 =r cos 0 and z = r sin 0, we have 

/ •In /•cd 


P 


e-^rdrdO 






d0 = 2u. 


Accordingly, / = -y/^Ti and 


y/2n 


e 


dy 
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If we introduce a new variable of integration，say jc, by writing 


少 = Y ， 办 >0, 


the preceding integral becomes 


/ *ao 


exp 


Since 6 > 0, this implies that 


{x~ a ) 1 

2b 2 


dx 


/w= ^ exp 




(X - a ) 2 
2b 2 


— OD < X < 00 


satisfies the conditions of being a p.d.f. of a continuous type of ran¬ 
dom variable. A random variable of the continuous type that has a 
p.d.f. of the form of f(x) is said to have a normal distribution, and any 
of this form is called a normal p.d.f. 

We can find the m.g.f. of a normal distribution as follows. In 


m 


-CO 

/ »oo 


ytx 1 

by/bi 


exp 




(x - a ) 2 

2b 2 


dx 


exp 


— 2b 2 tx + x 2 — lax + a 2 


dx 


we complete the square in the exponent. Thus M(t) becomes 


M{t) — exp 


x exp 


a 2 — (a + b 2 t) 2 
2b 2 


(x - a - b 2 tf 


户 QO 


byjln 


2b 2 


dx 


exp ( at + 


b 2 t 2S 


because the integrand of the last integral can be thought of as a normal 
p.d.f. with a replaced by a + b 2 t, and hence it is equal to 1. 

The mean fi and variance a 2 of a normal distribution will be 
calculated from M{t). Now 

M\t) = M(t)(a + b 2 t) 

and 

M"{t) = M(t)(b 2 ) + M(t)(a + b 2 tf. 
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Example 1. If X has the m.g.f. 

M(t) = e 1, + n ' 2 , 

then X has a normal distribution with fi = 2, a 2 = 64. 

The normal p.d.f. occurs so frequently in certain parts of statistics 
that we denote it, for brevity, by N(fi, a 2 ). Thus, if we say that the 
random variable ^is A^(0, 1), we mean that Ihas a normal distribution 
with mean /x = 0 and variance o 2 = 1, so that the p.d.f. of X is 

J{x) = \ —— e~^ 12 , —oo < x < oo. 

y/2lt 

If we say that X is ^(5,4), we mean that X has a normal distribution 
with mean /x = 5 and variance a 1 = 4, so that the p.d.f. of X is 

Ax) = ― 7 = exp 
2^/ln 

Moreover, if 

M(0 = e l2/2 . 


(x - 5f 
2(4) 


00 < X < oo. 
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is seen (1) to be symmetric about a vertical axis through x = pi, (2) 
to have its maximum oiXjip^/ln) at x = /i, and (3) to have the ^r-axis 
as a horizontal asymptote. It should also be verified that (4) there 
are points of inflection at jc = /i 土 

Remark. Each of the special distributions considered thus far has been 
“justified” by some derivation that is based upon certain concepts found 
in elementary probability theory. Such a motivation for the normal 
distribution is not given at this time; a motivation is presented in Chapter 5. 
However, the normal distribution is one of the more widely used distributions 
in applications of statistical methods. Variables that are often assumed to be 
random variables having normal distributions (with appropriate values of fi 
and a) are the diameter of a hole made by a drill press, the score on a test, 
the yield of a grain on a plot of ground, and the length of a newborn child. 

We now prove a very useful theorem. 


Theorem 1. If the random variable X is a 2 ), <r 2 > 0, then the 
random variable W = {X — fi)ja is N(0, 1). 

Proof. The distribution function G(w) of W is, since > 0, 


G(w) — Pr [ X < w ) = Pr (A" ^ »v<t + 


That is, 


G(>v) 


+ fi ^ 

(X - H) 2 ~ 

j — eX P 

一 oJ2n 

^ 2a 1 


dx. 


If we change the variable of integration by writing ^ = (x — n)ja, then 

G(w)= 

J — c 

Accordingly, the p.d.f. g(»v) = G^w) of the continuous-type random 
variable W is 




e 


->■ 2/2 


dy. 






e 


V/2 


— 00 < W < 00. 


Thus W is N(0, 1), which is the desired result (see also Exercise 3.100). 


This fact considerably simplifies the calculations of probabilities 
concerning normally distributed variables, as will be seen presently. 
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<p(z) 


FIGURE 3.2 



Suppose that X is N(ji ， <j 2 ). Then, with c, < c 2 we have, since 
Pr (Jf = C|) = 0, 

Pr (cy < X < c 2 ) = Pt(X< c 2 ) — Pt(X< c ,) 



because W = (X — ^)/a is ^V(0, 1). That is, probabilities concerning X, 
which is <r 2 ), can be expressed in terms of probabilities concerning 

W, which is N(Q, 1). 

An integral such as 

r* i 

-j=e- w2l2 dw 

J — 6o 兀 

cannot be evaluated by the fundamental theorem of calculus because 
an “antiderivative” of e~ w2/2 is not expressible as an elementary 
function. Instead, tables of the approximate value of this integral for 
various values of k have been prepared and are partially reproduced 
in Table III in Appendix B. We use the notation 

O(z) = I \ —— e~ w1 ' 2 dw. 
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Moreover, we say that 0(r) and its derivative O f (z) = (p(z) are, 
respectively, the distribution function and p.d.f. of a standard normal 
distribution iV(0, 1). These are depicted in Figure 3.2. 

To summarize, we have shown that if X is N(fi, a 2 ), then 


Pr (c^ < X < c 2 ) = Pr 


f X 


a 




a 


Pr 


f X 


a 


^ < A 




a 



It is left as an exercise to show that 0(—^) = 1 — O(x). 
Example 2. Let X be A^(2, 25). Then, by Table III, 


Pr(0< 10) = 0 




= 0(1.6)-0(-0.4) 

= 0.945 - (1 一 0.655) = 0.600 


and 


Pr(-8 <X< 1) 






= <D(-0.2) -<P(-2) 

=(1 - 0,579) - (1 一 0.977) = 0.398. 
Example 3. Let X be N(n ， a 2 ). Then, by Table III, 


Pr (/x — 2<r < A" < /x + 2a) 



/i + 2a — 

a ) 



= 0(2)-<D(-2) 

= 0.977 — (1 — 0.977) = 0.954. 


Example 4. Suppose that 10 percent of the probability for a certain 
distribution that is N(fi, a 2 ) is below 60 and that 5 percent is above 90. What 
are the values of /x and <r? We are given that the random variable X 
is N(^i, a 2 ) and that Pr (^<60) = 0.10 and Pr (^ < 90) = 0.95. Thus 
O[(60 - /x)/(t] = 0.10 and O[(90 - n)ja] = 0.95. From Table III we have 


生二 - 1.282, 

a 


90-fi 
a 


1.645. 


These conditions require that n = 73.1 and a = 10.2 approximately. 

Remark. In this chapter we have illustrated three types of parameters 
associated with distributions. The mean /x of N(n ， a 1 ) is called a location 
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parameter because changing its value simply changes the location of the 
middle of the normal p.d.f.; that is, the graph of the p.d.f. looks exactly the 
same except for a shift in location. The standard deviation a of N(/x, a 2 ) is called 
a scale parameter because changing its value changes the spread of the 
distribution. That is, a small value of a requires the graph of the normal 
p.d.f. to be tall and narrow, while a large value of a requires it to spread out 
and not be so tall. No matter what the values of n and a, however, the graph 
of the normal p.d.f. will be that familiar “bell shape.” Incidentally, the ^ of 
the gamma distribution is also a scale parameter. On the other hand, the a 
of the gamma distribution is called a shape parameter, as changing its value 
modifies the shape of the graph of the p.d.f. as can be seen by referring to 
Figure 3.1. The parameters p and n of the binomial and Poisson distributions, 
respectively, are also shape parameters. 

We close this section with an important theorem. 


Theorem 2. If the random variable X is N(pi, <r 2 ), a 2 > 0, then the 
random variable V = (X — fi) 2 /<r 2 is z 2 (l). 

Proof. Because V = W 1 , where W = (X — pL)ja is A^(0, 1), the 
distribution function G(y) of V is, for y > 0, 


G(y) = Pr (^ < y) = Pr ^ ^^ y/v). 


That is, 

1 


G(v) = 2 -j= e~ w2l2 dw, 0<v, 

and 

G(v) = 0, y < 0. 

If we change the variable of integration by writing w 


G ⑻ 




e~ yt2 dy, 0 <v. 


Hence the p.d.f. ^(i?) = G\v) of the continuous-type random variable 



咖） = 



= 0 elsewhere. 


0 < r < qo, 


Since 贫⑻ is a p.d.f. and hence 

•00 

g{v)dv = 1, 

it must be that r(|) = ^fn and thus V is /(I). 
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EXERCISES 


3.48. If 


<D(z) = 


f - I _ e~ wZ 2 dw, 


show that 0(—z) = 1 — O(z). 

3.49. If I is A^(75, 100), find Pr (X < 60) and Pr (10 < X < 100). 

3.50. If X is N(fi, a 2 ), find b so that Pr [ — ft < (A" — n)/a <b] = 0.90. 

3.51. Let Z be N(n, a 2 ) so that Pr (X < 89) = 0.90 and Pr (JIT < 94) = 0.95. 
Find fi and a 2 . 

3.52. Show that the constant c can be selected so that fix) = c2~ xI , 
— oo < x < oo, satisfies the conditions of a normal p.d.f. 

Hint: Write 2 = e ln2 . 

3.53. If A" is a 2 ), show that E(\X — /x|) = ayj2jn. 

3.54. Show that the graph of a p.d.f. N(n, a 2 ) has points of inflection at 
x = fi — a and x = fi + a. 

3.55. Evaluate J 2 3 exp [—2(x — 3) 2 ] dx. 

3.56. Determine the ninetieth percentile of the distribution, which is 
N(65, 25). 

3.57. If e 3 ' + 8,2 is the m.g.f. of the random variable X, find Pr ( — 1 < X < 9). 

3.58. Let the random variable X have the p.d.f. 

J[x) = - 0 < x < oo, zero elsewhere. 

Find the mean and variance of X. 

Hint: Compute E(X) directly and £(A^) by comparing that integral with 
the integral representing the variance of a variable that is ^V(0, 1). 

3.59. Let ^ be N(5, 10). Find Pr [0.04 < (^ - 5) 2 < 38.4]. 

3.60. If X is A^( 1,4), compute the probability Pr(l < X 2 <9). 

3.61. If X is iV(75, 25), find the conditional probability that X is greater than 
80 relative to the hypothesis that X is greater than 77. See Exercise 2.18. 

3.62. Let AT be a random variable such that E(X 2m ) = (2m)!/(2 m rn!), 
m = 1,2, 3,... and EiX 2 ^ ~') = 0, m = 1,2, 3,... . Find the m.g.f. and 
the p.d.f. of X. 

3.63. Let the mutually independent random variables X t , X 2 , and X 3 be 
N(0, 1), )V(2,4), and iV( — 1, 1) ， respectively. Compute the probability that 
exactly two of these three variables are less than zero. 
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3.64. Compute the measures of skewness and kurtosis of a distribution which 
is N(n, a 2 ). 

3.65. Let the random variable. have a distribution that is N(n ， a 1 ). 

(a) Does the random variable Y = X 2 also have a normal distribution? 

(b) Would the random variable Y = aX + b, a and b nonzero constants, 
have a normal distribution? 

Hint: In each case, first determine Pr (y < 少 ). 

3.66. Let the random variable X be o' 2 )- What would this distribution be 
if a 2 = 0? 

Hint: Look at the m.g.f. of X for o- 2 > 0 and investigate its limit as 
a 2 ^0. — 

3.67. Let (p(x) and <D(x) be the p.d.f. and distribution function of a standard 
normal distribution. Let Y have a truncated distribution with p.d.f. 
g(y) = q>{y)j[<b(b) — 4>(a)], a < y < b, zero elsewhere. Show that E(Y) is 
equal to [<p(a) — (p(b)\/[^>(b) — <D(a)]. 

3.68. Let J{x) and /^x) be the p.d.f. and the distribution function of a 
distribution of the continuous type such that f(x) exists for all x. Let the 
mean of the truncated distribution that has p.d.f. g(y) =J[y)/F\b), 
— oo<y<b, zero elsewhere, be equal to —J{b)/f{b) for all real b. Prove 
that J[x) is a p.d.f. of a standard normal distribution. 

3.69. Let X and K be independent random variables, each with a distribution 
that is MO, 1). Let Z = X -V Y. Find the integral that represents the 
distribution function G(z) = Pr 4- K < z) of Z. Determine the p.d.f. 
of Z. 

Hint: We have that G{z) = H{x, z) dx, where 

- Jc - 

H(x, z) = & exp [ - (X 2 + y 2 )/2] dy. 

*^—00 

Find G\z) by evaluating [dH(x, z)/dz] dx. 

3.5 The Bivariate Normal Distribution 

Remark. If the reader with an adequate background in matrix algebra so 
chooses, this section can be omitted at this point and Section 4.10 can be 
considered later. If this decision is made, only an example in Section 4.7 and 
a few exercises need be skipped because the bivariate normal distribution 
would not be known. Many statisticians, however, find it easier to remember 
the multivariate (including the bivariate) normal p.d.f. and m.g.f. using 
matrix notation that is used in Section 4.10. Moreover, that section provides 
an excellent example of a transformation (in particular, an orthogonal one) 
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and a good illustration of the moment-generating function technique; these 
are two of the major concepts introduced in Chapter 4. 

Let us investigate the function 

J[x, = - . e~ qll y —oo < Jt < oo, — oo < j < oo, 

2n<J\(T ls /l — p 1 

where, with tr, > 0, <r 2 > 0, and — 1 < p < 1, 



At this point we do not know that the constants fi 2 , <rj, a\, and p 
are those respective parameters of a distribution. As a matter of fact, 
we do not know that f{x, j) has the properties of a joint p.d.f. It will 
be shown that: 


1. J[x, y) is a joint p.d.f. 

2. Xis N(n u a]) and ris N(ii 2 , aj). 

3. p is the correlation coefficient of X and Y. 

A joint p.d.f. of this form is called a bivariate normal p.d.f., and the 
random variables X and Y are said to have a bivariate normal 
distribution. 

That the nonnegative function^, >>) is actually a joint p.d.f. can 
be seen as follows. Define /,(jc) by 

/•oo 

A(x) = J{x, y) dy. 

^ —oo 

Now 



where /? = // 2 + p(a 2 /^\)(x — pL x ). Thus 


f\(x) 


exp[— ( 叉 ― 川 ) 2 /2<r^ 

(Tis/2n 


/ •oo 


exp {— (j — bfl[2aj{\ - p 2 )]} 
— p 1 \/27r 


dy. 


For the purpose of integration, the integrand of the integral in this 



expression for/, (jc) maybe considered a normal p.d.f. with mean/) and 
variance ^(l — p 2 ). Thus this integral is equal to 1 and 


fi( x ) ~ 7= 

fr ly /2n 

Since 


A x ^ y)dydx= /,(x) dx = 1 , 

J - CO 

the nonnegative function J[x, 3 ^) is a joint p.d.f. of two continuous- 
type random variables X and Y. Accordingly, the function /,(x) is 
the marginal p.d.f. of X, and X is seen to be N(ji h a,). In like 
manner, we see that Y is N(ji 2 , 

Moreover, from the development above, we note that 

AX，y) =f ' (X) L t J\-p^2n eXP [~ vf(1 -U])' 

where b = fi 2 + p(<r 2 /<X| )(jc — /x,). Accordingly, the second factor in the 
right-hand member of the equation above is the conditional p.d.f. of 
V, given that X = x. That is, the conditional p.d.f. of Y, given X = x, 
is itself normal with mean /x 2 + p{a 2 jo\){x — Hi) and variance 
<r^(l — p 2 ). Thus, with a bivariate normal distribution, the conditional 
mean of Y, given that A" = jc, is linear in jc and is given by 

E(Y\x) = fi 2 + (x - 

Since the coefficient of jc in this linear conditional mean E( Y\x) 
is p<r 2 /<T|, and since 0*1 and a 2 represent the respective standard 
deviations, the number p is, in fact, the correlation coefficient of X and 
Y. This follows from the result, established in Section 2.3, that the 
coefficient of jc in a general linear conditional mean E(Y\x) is the 
product of the correlation coefficient and the ratio a 2 /^\- 

Although the mean of the conditional distribution of Y, given 
X = x, depends upon x (unless p = 0), the variance al(l — p 2 ) is the 
same for all real values of x. Thus, by way of example, given that X = x, 
the conditional probability that Kis within (2.576 )<t 2v /1 — p 1 units of 
the conditional mean is 0.99, whatever the value of x may be. In this 
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sense, most of the probability for the distribution of X and Y lies in 
the band 

/i 2 + p 宫 (x _ 〆■) ± (2.576 )<t 2x /1 - p 2 

about the graph of the linear conditional mean ： For every fixed 
positive <r 2 , the width of this band depends upon p. Because the band 
is narrow when p 2 is nearly 1, we see that p does measure the intensity 
of the concentration of the probability for X and Y about the linear 
conditional mean. This is the fact to which we alluded in the remark 
of Section 2.3. 

In a similar manner we can show that the conditional distribution 
of X, given y = 少 ， is the normal distribution 

N Hi + p^-(y- ^(1 - P 2 ) - 

Example 1. Let us assume that in a certain population of married 
couples the height of the husband and the height X 2 of the wife have a 
bivariate normal distribution with parameters /i, = 5.8 feet, 只 2 = 5.3 feet, 
(7, = cr 2 = 0.2 foot, and p = 0.6. The conditional p.d.f. of X lf given = 6.3, 
is normal with mean 5.3 + (0.6)(6.3 — 5.8) = 5.6 and standard deviation 
(0.2)^(1 — 0.36) = 0.16. Accordingly, given that the height of the husband 
is 6.3 feet, the probability that his wife has a height between 5.28 and 5.92 
feet is 

Pr (5.28 <X 2 < 5.92| JIT, = 6.3) = 0(2) - $(-2) = 0.954. 

The interval (5.28, 5.92) could be thought of as a 95.4 percent prediction 
interval for the wife’s height, given = 6.3. 


The m.g.f. of a bivariate normal distribution can be determined as 
follows. We have 


t 2 ) 


e l ' x + y) dx dy 


00 ^ —00 
/ «00 




/ 2 % 1 (舶办 


dx 


for all real values of /, and r 2 . The integral within the brackets is the 
m.g.f. of the conditional p.d.f. / 2 |i(j|jc). Since/nCyk) is a normal p.d.f. 
with mean fi 2 + pip 2 {a { )\x — 从 ） and variance (t 2 2 (] — p 2 ), then 


/*00 


e t2y f2\\{y\x)dy = ^{t 2 




^<7^(1 - p 2 ) I 
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Accordingly, M(t u t 2 ) can be written in the form 

f o" 2 ^(1 - p 2 )] 

exp j t 2 n 2 - t 2 p - //, + 2 

But E(e' x ) = exp [pi\t + (<r^ 2 )/2] for all real values of t. Accordingly, 
if we set r = r ，+ hpiPitox), we see that M(t lt t 2 ) is given by 


1 

»oo 



exp 

U + t 2 p-\x 

) J 

— CO 

_ \ \) 一 


f\(x) dx. 


exp < hn 2 - t 2 p + ^ + + t 2 p^ 


+ W 


~ Y~ 


or, equivalently, 

M(t u ? 2 ) = exp I + 〜2 + 


郝 + 2pff_oW 2 + a\tf 


It is interesting to note that if, in this m.g.f. M{t u t 2 ), the correlation 
coefficient p is set equal to zero, then 

Thus X and Y are independent when p = 0. If, conversely, 

M{t u t 2 ) = M{t u 0)M(0, / 2 ), 

we have e p<Tl<T2，xt2 = 1. Since each of a } and a 2 is positive, then p = 0. 
Accordingly, we have the following theorem. 


Theorem 3. Let X and Y have a bivariate normal distribution with 
means /i| and fi 2 , positive variances a] and and correlation coefficient 
p. Then X and Y are independent if and only if p = 0. 

As a matter of fact, if any two random variables are independent 
and have positive standard deviations, we have noted in Example 4 of 
Section 2.4 that p = 0. However, p = 0 does not in general imply that 
two variables are independent; this can be seen in Exercises 2.20 (c) and 
2.25. The importance of Theorem 3 lies in the fact that we now know 
when and only when two random variables that have a bivariate 
normal distribution are independent. 
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EXERCISES 

3.70. Let X and Y have a bivariate normal distribution with respective 
parameters /i r = 2 . 8 , fi Y = 110 , a\ = 0.16, a\ - 100 , and p = 0 . 6 . 
Compute: 

(a) Pr (106 < Y< 124). 

(b) Pr(106< Y< 124|^=3.2). 

3.71. Let X and Y have a bivariate normal distribution with parameters 

Hi = 3, fi 2 = 1, o\ = 16, — 25， and p =\. Determine the following 

probabilities: 

(a) Pr (3 < r < 8 ). 

(b) Pr(3 < Y< 8|JT=7). 

(c) Pr(-3<Jir<3). 

(d) Pr(-3< J1T< 3|y= -4). 

3.72. If M(/|, t 2 ) is the m.g.f. of a bivariate normal distribution, compute the 
covariance by using the formula 

d 2 M(0, 0) dM(0, 0) 8M(Q, 0) 

dt { dt 2 dt x dt 2 

Now let t 2 ) = In t 2 ). Show that ^^(0, 0)/dt x dt 2 gives this 
covariance directly. 

3.73. Let X and Y have a bivariate normal distribution with parameters 

fi y = 5, = 10, = 1, = 25, and p > 0. If Pr (4 < Y < \6\X = 5)= 

0.954, determine p. 

3.74. Let X and Y have a bivariate normal distribution with parameters 
fi\ = 20, pii = 40,a] = 9,a^ = 4, and p = 0 . 6 . Find the shortest interval for 
which 0.90 is the conditional probability that Y is in this interval, given that 

X=22. '' 

3.75. Say the correlation coefficient between the heights of husbands and 
wives is 0.70 and the mean male height is 5 feet 10 inches with standard 
deviation 2 inches, and the mean female height is 5 feet 4 inches with 
standard deviation 1 ； inches. Assuming a bivariate normal distribution, 
what is the best guess of the height of a woman whose husband’s height is 
6 feet? Find a 95 percent prediction interval for her height. 

3.76. Let 

Ax, y) = (l/ 2 ?r) exp [-^x 2 + /)]{1 + 叮 exp [-+ / - 2 )]}, 

where —oo < a < oo, —oo <^ < oo. is a joint p.d.f” it is not a 

normal bivariate p.d.f. Show thaty(jc, 7 ) actually is a joint p.d.f. and that 
each marginal p.d.f. is normal. Thus the fact that each marginal p.d.f. is 
normal does not imply that the joint p.d.f. is bivariate normal. 
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3.77. Let X, Y, and Z have the joint p.d.f. 



exp 


jc 2 + z 2 + z 2N 
2 , 


1 + xyz exp 


x 2 + y 2 + z 2 
2 ~ 


where —oo<x<oo, —oo<>»<oo, and — oo < z < oo. Whiled, Y, and 
Z are obviously dependent, show that X, Y, and Z are pairwise independent 
and that each pair has a bivariate normal distribution. 

3.78. Let X and Y have a bivariate, normal distribution with parameters 
^ = fi 2 = 0^ a\ = al = 1, and correlation coefficient p. Find the distribution 
of the random variable Z = aX + bY in which a and b are nonzero 
constants. 1 

Hint: Write G(z) = Pr (Z < z) as an iterated integral and compute 
G\z) = g{z) by differentiating under the first integral sign and then 
evaluating the resulting integral by completing the square in the exponent. 


ADDITIONAL EXERCISES 


3.79. Let X have a binomial distribution with parameters n = 288 and 
/ Use Chebyshev’s inequality to determine a lower bound for 
Pr(76 < JIT< 116). 


3.80. Let f[x) 


e—W 

~icr , 


of n so that x = 


x = 0, 1 ， 2, ... ， zero elsewhere. Find the values 
is the unique mode; that is, y(0) <y(l) and 


3.81. Let X and y be two independent binomial variables with parameters 
« = 4, p = 5 and « = 3, p = f, respectively. Determine Pr (X — Y = 3). 

3.82. Let X and Y be two independent binomial variables, both with 
parameters n and p = \. Show that 

( 2 / 0 ! 


Pr (I - y = 0) 


n\ n\ (2，. 


3.83. Two people toss a coin five independent times each. Find the proba¬ 
bility that they will obtain the same number of heads. 


3.84. Color blindness appears in 1 percent of the people in a certain 
population. How large must a sample with replacement be if the proba¬ 
bility of its containing at least one color-blind person is to be at least 0.95? 
Assume a binomial distribution b(n,p = 0.01) and find n. 

3.85. Assume that the number X of hours of sunshine per day in a certain 
place has a chi-square distribution with 10 degrees of freedom. The profit 
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of a certain outdoor activity depends upon the number of hours of sun¬ 
shine through the formula 

profit - 1000(1 - e-^ 10 ). 

Find the expected level of the profit. 

3.86. Place five similar balls (each either red or blue) in a bowl at random as 
follows: A coin is flipped 5 independent times and a red ball is placed in the 
bowl for each head and a blue ball for each tail. The bowl is then taken and 
two balls are selected at random without replacement. Given that each of 
those two balls is red, compute the conditional probability that 5 red balls 
were placed in the bowl at random. 

3.87. If a die is rolled four independent times, what is the probability of one 
four, two fives, and one six, given that at least one six is produced? 

3.88. Let the p.d.f. f(x) be jK>sitive on, and only, on, the integers 
0,1,2, 3,4, 5, 6 , 7, 8,9,10, so that f(x) = [(11 - x)/x]f(x -1), x= l,2, 
3,.... 10. Find f(x). 

3.89. Let X and Y have a bivariate normal distribution with ii x — 5,n 2 = 10, 
a? = 1, <t^ = 25, and p Compute Pr (7 < F < 19|x = 5). 

3.90. Say that Jim has three cents and that Bill has seven cents. A coin is 
tossed ten independent times. For each head that appears, Bill pays Jim 
two cents, and for each tail that appears, Jim pays Bill one cent. What 
is the probability that neither person is in debt after the ten trials? 

3.91. If £(^0 = [(r + 1)!](20, r = 1, 2, 3,..., find the m.g.f. and p.d.f. 

of X. ^ 

3.92. For a biased coin, say that the probability of exactly two heads in three 
independent tosses is 3 . What is the probability of exactly six heads in nine 
independent tosses of this coin? 

3.93. It is discovered that 75 percent of the pages of a certain book contain 
no errors. If we assume that the number of errors per page follows a Poisson 
distribution, find the percentage of pages that have exactly one error. 

3,!M. Let ^ have a Poisson distribution with double mode at jc = 1 and x = 2. 
Find Pr [X = 0]. 

3.95. Let X and Y be jointly normally distributed with \i x = 20, /i K = 40, 
a x = 3, a y = 2, p = 0.6. Find a symmetric interval about the conditional 
mean, so that the probability is 0.90 that Y lies in that interval given that 
X equals 25. 
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3.96. Let / [x) = — / x = 0, 1,..., 10, zero elsewhere. Find 

the values of p, so that7(0) >J[l) > >Am. 

3.97. Lety(x, _v) be a bivariate normal p.d.f. and let c be a positive constant 
so that c < (2na ] a 2s ^i — p 2 ) -1 . Show that c =/{x, >0 defines an ellipse in 
the x 少 -plane. 


3.98. Let /,(jc, y) and f 2 (x, y) be two bivariate normal probability density 
functions, each having means equal to zero and variances equal to 1. 
The respective correlation coefficients are p and —p. Consider the joint 
distribution of X and Y defined by the joint p.d.f. \f\ (x, y) + f 2 (x, y)]/2. 
Show that the two marginal distributions are both A^(0,1), X and Y are 
dependent, and E(XY) = 0 and hence the correlation coefficient of X and 
Y is zero. 


3.99. Let X be N(ji, a 1 ). Define the random variable Y = e x and find its p.d.f. 
by differentiating G(y) = Pr (e x ^ _y) = Pr (A" < In >^). This is the p.d.f. of a 
lognormal distribution. 

3.100. In the proof of Theorem 1 of Section 3.4, we could let 

G(w) = Ft {X < wa + #) = F\wa + ju), 

where F and F =f are the distribution function and p.d.f. of X, 
respectively. Then, by the chain rule, 

g{yv) = G\w) = [F\wff + n)]a. 

Show that the right-hand member is the p.d.f. of a standard normal 
distribution; thus this provides another proof of Theorem I. 
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Distributions 
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of Random 
Variables 


4.1 Sampling Theory 

Let X u X 2 ,..., X„ denote n random variables that have the joint 
p.d.f. /(jcj, jc 2 , ..., jc„). These variables may or may not be 
independent. Problems such as the following are very interesting in 
themselves; but more important, their solutions often provide the basis 
for making statistical inferences. Let K be a random variable that is 
defined by a function of X 2 ,..., X„, say Y = u(X u X 2 ,..., X„). 
Once the p.d.f. jc,, jc 2 , ... ， Jf") is given, can we find the p.d.f. of Y 1 
In some of the preceding chapters, we have solved a few of these 
problems. Among them are the following two. If n = 1 and if X x is 
N(ji, <r 2 ), then Y = — fi)/(T is N(0, 1). Let n be a positive integer and 
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let the random variables X h i= 1,2,..., be independent, each 
having the same p.d.f. J{x) = p x (\ — p) 1 _ x , x = 0, 1, and zero else- 

n 

where. If y = X h then Y is b(n,p). It should be observed that 

I 

Y = u(X t ) = (A", — /i)/cr is a function of that depends upon 
the two parameters of the normal distribution; whereas Y= 

n 

u{X u X 2i ..., X„) = Y, X does not depend upon p, the parameter of 

the common p.d.f. of the X h i= 1,2,..., n. The distinction that 
we make between these functions is brought out in the following 
definition. 

Definition 1. A function of one or more random variables that does 
not depend upon any unknown parameter is called a statistic. 

In accordance with this definition, the random variable y = ^ A", 

I 

discussed above is a statistic. But the random variable Y = {X\ — n)ja 
is not a statistic unless /i and a are known numbers. It should be noted 
that, although a statistic does not depend upon any unknown 
parameter, the distribution of the statistic may very well depend upon 
unknown parameters. 

Remark. We remark, for the benefit of the more advanced reader, that a 
statistic is usually defined to be a measurable function of the random variables. 
In this book, however, we wish to minimize the use of measure theoretic 
terminology, so we have suppressed the modifier “measurable.” It is quite 
clear that a statistic is a random variable. In fact, some probabilists avoid the 
use of the word “statistic” altogether, and they refer to a measurable function 
of random variables as a random variable. We decided to use the word 
“statistic” because the reader will encounter it so frequently in books and 
journals. 

We can motivate the study of the distribution of a statistic in the 
following way. Let a random variable X be defined on a sample space 
劣 and let the space of X be denoted by . In many situations 
confronting us, the distribution of X is not completely known. For in¬ 
stance, we may know the distribution except for the value of an 
unknown parameter. To obtain more information about this distri¬ 
bution (or the unknown parameter), we shall repeat under identical 
conditions the random experiment n independent times. Let the 
random variable X ( be a function of the rth outcome, / = 1,2,. .., 
Then we call X\, X 2 ,..., X n the observations of a random sample 
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from the distribution under consideration. Suppose that we can define 
a statistic Y=u{X u X 2 ,..., X n ) whose p.d.f. is found to be g(^). 
Perhaps this p.d.f. shows that there is a great probability that Y has 
a value close to the unknown parameter. Once the experiment has been 
repeated in the manner'indicated and we have X\ =jc,, ..., X„ = x„, 
then y = u(x ti x 2 ,..., x„) is a known number. It is to be hoped that 
this known number can in some manner be used to elicit information 
about the unknown parameter. Thus a statistic may prove to be useful. 

Remarks. Let the random variable X be defined as the diameter of a hole 
to be drilled by a certain drill press and let it be assumed that X has a normal 
distribution. Past experience with many drill presses makes this assumption 
plausible; but the assumption does not specify the mean fj. nor the variance 
<t 2 of this normal distribution. The only way to obtain information about n 
and a 2 is to have recourse to experimentation. Thus we shall drill a number, 
say = 20, of these holes whose diameters will be X lt X 2 ,..., X 20 - Then 
X t , X 2> . .., X 20 is a random sample from the normal distribution under 
consideration. Once the holes have been drilled and the diameters measured, 
the 20 numbers may be used, as will be seen later, to elicit information about 
H and a 2 . 

The term “random sample” is now defined in a more formal 
manner. 

Definition 2. Let X 2 ,..., X„ denote n independent random 
variables, each of which has the same but possibly unknown 
p.d.f./(j：); that is, the probability density functions of A",, ^ 2 , , X„ 

are, respectively, 71(^,)=^,), Mx 2 )=J{x 2 ) . so 

that the joint p.d.f. is )/(^ 2 )' * A x n)- The random variables 
X u X 2 ,..., X„ are then said to constitute a random sample from 
a distribution that has p.d.f. y (: c). That is, the observations of a 
random sample are independent and identically distributed (often 
abbreviated i.i.d.). 

Later we shall define what we mean by a random sample from a 
distribution of more than one random variable. 

Sometimes it is convenient to refer to a random sample of size 
n from a given distribution and, as has been remarked, to refer 
to X x , X 2 ,... ,X n as the observations of the random sample. A 
reexamination of Example 2 of Section 2.5 reveals that we found the 
p.d.f. of the statistic, which is the maximum of the observations 
of a random sample of size /i = 3, from a distribution with p.d.f. 
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J{x) = 2jc, 0 < jc < 1, zero elsewhere. In Section 3.1 we found the 
p.d.f. of the statistic, which is the sum of the observations of a random 
sample of size/i from a distribution that has p_d_f_/(;c) = /^(l — pY~ x , 
jc = 0, 1, zero elsewhere. This fact was also referred to at the beginning 
of this section. 

In this book, most of the statistics that we shall encounter will 
be functions of the observations of a random sample from a given 
distribution. Next, we define two important statistics of this type. 

Definition 3. Let X x , X 2 ,..., X„ denote a random sample of size n 
from a given distribution. The statistic 

^ ^ + & + • • • + 夂 ^ X, 

X = - = > —— 

n ,= I n 

is called the mean of the random sample, and the statistic 

/ =i n iTi n 

is called the variance of the random sample. 

Remarks. Many writers do not define the variance of a random sample 

n 一 

as we have done but, instead, they take S 2 = (X, — X) 2 /(n — 1). There are 

I 

good reasons for doing this. But a certain price has to be paid, as we shall 
indicate. Let x,, jc 2 , ..., jc„ denote experimental values of the random variable 
X that has the p.d.f.y^x) and the distribution function F(x). Thus we may look 
upon jc,, jc 2 , .. ., as the experimental values of a random sample of size n 
from the given distribution. The distribution of the sample is then defined to 
be the distribution obtained by assigning a probability of \jn to each of 
the points JC|, x 2 ,..., x n . This is a distribution of the discrete type. The 
corresponding distribution function will be denoted by F„{x) and it is a step 
function. If we let 人 denote the number of sample values that are less than 
or equal to x, then F„(x) = f x jn, so that F„(x) gives the relative frequency of 
the event A" < x in the set of n observations. The function F„(x) is often called 
the “empirical distribution function” and it has a number of uses. 

Because the distribution of the sample is a discrete distribution, the mean 

n 

and the variance have been defined and are, respectively, I x,/n = x and 

fl I 

^ (.Xj — 5c) 7 jn = j 2 . Thus, if one finds the distribution of the sample and the 

i 

associdted empirical distribution function to be useful concepts, it would 
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seem logically inconsistent to define the variance of a random sample in any 
way other than we have. 

We have also defined X and S 2 only for observations that are i.i.d.，that 
is, when X x , X 2 ,.. . ,X„ denote a random sample. However, statisticians often 
use these symbols, X and S 2 , even if the assumption of independence is 
dropped. For example, suppose that X t , X 2 ,..., X„ were the observations 
taken at random from a finite collection of numbers without replacement. 
These observations could be thought of as a sample and its mean X and 
variance S 2 computed; yet X 2 ,..., X n are dependent. Moreover, the n 
observations could simply be some values, _not necessarily taken from a 
distribution, and we could compute the mean A"and the variance S 2 associated 
with these n values. If we do these things, however, we must recognize the 
conditions under which the observations were obtained, and we cannot make 
the same statements that are associated with the mean and the variance of 
what we call a random sample. 

Random sampling distribution theory means the general problem 
of finding distributions of functions of the observations of a random 
sample. Up to this point, the only method, other than direct prob¬ 
abilistic arguments, of finding the distribution of a function of one 
or more random variables is the distribution function technique. 
That is, if X ]t X 2 ,..., X n are random variables, the distribution of 
Y = u(X\, X 2 ,..., X„) is determined by computing the distribution 
function of Y, 

G{y) = VT[u(X u X 2 ,...,X n )<y]. 

Even in what superficially appears to be a very simple problem, this 
can be quite tedious. This fact is illustrated in the next paragraph. 

Let X x , X 2 , X l denote a random sample of size 3 from a standard 
normal distribution. Let Y denote the statistic that is the sum of 
the squares of the sample observations. The distribution function 
of Y is 


G(y)^?T(X] + X 2 2 + X]<y). 
If j < 0, then G{y) = 0. However, if j > 0, then 


G{y) 


(271 产 


exp 


2 


(x] + x 2 2 + x 2 3 ) 


dx { dx 2 dx z . 


where A is the set of points (jc,, x 2 , x 3 ) interior to, or on the surface of, 
a sphere with center at (0, 0, 0) and radius equal to ^Jy. This i 


is 
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not a simple integral. We might hope to make progress by changing 
to spherical coordinates: 

X\ — p cos 6 sin (p, . x 2 = p sin 9 sin 屮， x 3 = p cos (p, 

where p > 0, 0 < 0 < 2n, 0 < < n. Then, for 少 > 0 ， 




G(y) 


/*2n 


(2tt) 


)/2 


e~ p2,2 p 2 sin q> d(pd0 dp 




p 1 e -p 2 ! 1 dp. 


If we change the variable of integration by setting p = v /»v, we have 


G(y) 




2 


e~ w/1 dw, 


for 少 > 0. Since y is a random variable of the continuous type, the 
p.d.f. of Y is g(_v) = Thus 


giy) 


y/2 - ' e ~y/ 2 ^ 0 < y < oo. 


— 0 elsewhere. 


Because r(^) = (!)r(|) = (^)^/ 71 , and thus yflK = r ① 2 3/2 , we see that 
Y is x 2 (3). ^ ^ 

The problem that we have just solved highlights the desirability of 
having, if possible, various methods of determining the distribution of 
a function of random variables. We shall find that other techniques are 
available and that often a particular technique is vastly superior to the 
others in a given situation. These techniques will be discussed in 
subsequent sections. 

Example 1. Let the random variable y be distributed uniformly over the 
unit interval 0 < ^ < 1 ; that is, the distribution function of Y is 

G(y) = 0 ，少 < 0 ， 

= y , 0 < 少 < 1， 

=1 ， 1 <y- 

Suppose that F(x) is a distribution function of the continuous type which is 
strictly increasing when 0 < F(x) < 1. If we define the random variable X 
by the relationship Y = F(X), we now show that X has a distribution 
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which corresponds to /1(x). If 0 < /1(x) < 1 ， the inequalities X < x and 
F\X) < are equivalent. Thus, with 0 < /X^) < 1, the distribution of X is 

Pr (X^x) = Pr [F(X) < F(x)] = Pr[K< 八为] 

because Y = F{X). However, Pr (K < j) = G(j), so we have 

Pr (I < jc) = G [ 尺 jc)] = F[x), 0<F(x)< 1. 

That is, the distribution function of X is F[x). 

This result permits us to simulate random variables of different 
types. This is done by simply determining values of the uniform 
variable Y, usually with a computer. Then, after determining the 
observed value Y = y, solve the equation y = F{x), either explicitly or 
by numerical methods. This yields the inverse function jc = F~\y). By 
the preceding result, this number jc will be an observed value of X that 
has distribution function F{x). 

It is also interesting to note that the converse of this result is true. 
If X has distribution function of the continuous type, then 
Y = F{X) is uniformly distributed over 0 < < 1. The reason for this 

is, for 0 < 少 < 1 ， that 

Pr(F^^) = Pr [F{X) <>；] = Pr [X < 广 乂力]. 

However, it is given that Pr < ^) = /^x), so 

Pr(r<^) = J fIF- I (>；)]=^ 0 <^< 1 . 

This is the distribution function of a random variable that is distri¬ 
buted uniformly on the interval ( 0 , 1 ). 

EXERCISES 

4.1. Show that 



where X O. 

I 

4.2. Find the probability that exactly four observations of a random 
sample of size 5 from the distribution having p.d.f. y(x) = (x + 1)/2, 
— 1 < x < 1, zero elsewhere, exceed zero. 

4.3. Let X U X 2 , A" 3 be a random sample of size 3 from a distribution that 



162 


Distributions of Functions of Random Variables {Ch. 4 


is N(6, 4). Determine the probability that the largest sample observation 
exceeds 8. 

4.4. What is the probability that at least one observation of a random 
sample of size n = 5 from a continuous-type distribution exceeds the 
90th percentile? 

4.5. Let X have the p.d.f. f(x) = 4X 3 , 0 < jc < 1, zero elsewhere. Show that 
Y= -2 InX 4 is x\2). 

4.6. Let X t , X 2 be a random sample of size n = 2 from a distribution with 
p.d.f./(;c) = 4X 3 ,0 < x < l,zero elsewhere. Find the mean and the variance 
of the ratio Y=XJX 2 . 

Hint: First find the distribution function Pr( Y < y) when 0 < j < 1 and 
then when 1 < 

4.7. Let X 2 be a random sample from the distribution having 
p.d.f. J{x) = 2x, 0 < jc < 1, zero elsewhere. Find Pr (X, /X 2 < and 

4.8. If the sample size is /i = 2, find the constant c so that S 2 = c{X x — X 2 ) 2 . 

4_9. If Xj = i, i = 1,2,... ,n, compute the values of x = L x“n and 
s 2 = 1, {x, — xf/n. 

4.10. Let 乃 =a + bx h i = 1,2,..., n, where a and b are constants. Find 
歹 =S y>ijn and 5 ^ = E (>>,• — y) 2 /n in terms of a ， b, x = 11 x,/n, and 
s l x = 'L(x i — x) 2 /n. 

4.11. Let X t and X 2 denote two i.i.d. random variables, each from a 
distribution that is A^(0, 1). Find the p.d.f. of Y = X]-\- X\. 

Hint: In the double integral representing Pr (y < v), use polar 
coordinates. 

4.12. The four values 少 丨 = 0.42, y 2 = 0.31 ，少 3 = 0.87, and = 0.65 represent 
the observed values of a random sample of size n = 4 from the uniform 
distribution over 0 <少< 1. Using these four values, find a corresponding 
observed random sample from a distribution that has p.d.f. /(x) = e~ x , 
0 < x < 00 , zero elsewhere. 

4.13. Let X u X 7 denote a random sample of size 2 from a distribution with 

p.d.f •朋 = 5 ,0 < x < 2, zero elsewhere. Find the joint p.d.f. of and X 2 . 

Let K = A" ，+ X 2 . Find the distribution function and the p.d.f. of Y. 

4.14. Let X 2 denote a random sample of size 2 from a distribution with 
p.d.f. j\x) = 1, 0 < x < 1, zero elsewhere. Find the distribution function 
and the p.d.f. of y = XJX 2 . 

4.15. Let ^ 1 ， X 2 , X 3 be three i.i.d. random variables, each from a distri¬ 
bution having p.d.f. f{x) = 5x 4 , 0 < x < 1, zero elsewhere. Let Y be the 
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largest observation in the sample. Find the distribution function and p.d.f. 
of Y. 


4.16. Let and X 2 be observations of a random sample from a distribution 
with p.d.f. = 2x, 0 < x < 1, zero elsewhere. Evaluate the conditional 
probability Pr (I, < X 2 \X t < 2X 2 ). 

4.2 Transformations of Variables of the Discrete Type 

An alternative method of finding the distribution of a function 
of one or more random variables is called the change-of-variable 
technique. There are some delicate questions (with particular reference 
to random variables of the continuous type) involved in this technique, 
and these make it desirable for us first to consider special cases. 

Let X have the Poisson p.d.f. 

LL X £^ 

= x = 0,1,2 ,..., 

= 0 elsewhere. 

As we have done before, let denote the space s/ = {x: x = 
0 ,1, 2,...so that / is the set where /(x) > 0. Define a new 
random variable Y by Y = 4X. We wish to find the p.d.f. of Y by 
the change-of-variable technique. Let y = 4x. We call j = 4x a 
transformation from x to y, and we say that the transformation maps 
the space js/ onto the space ^ = {y: y = 0,4, S, 12,...}. The space 满 
is obtained by transforming each point in j?/ in accordance with = Ax. 
We note two things about this transformation. It is such that to each 
point in there corresponds one, and only one, point in 激； and 
conversely, to each point in 涿 there corresponds one, and only one, 
point in s/. That is, the transformation y = 4x sets up a one-to-one 
correspondence between the points of and those of Any function 
y = u(x) (not merely y = 4x) that maps a space (not merely our s^) 
onto a space 满 (not merely our 激） such that there is a one-to-one 
correspondence between the points of and those of 满 is called a 
one-to-one transformation. It is important to note that a one-to-one 
transformation, y = u(x), implies that x is a single-valued function of 
y. In our case this is obviously true, since y = 4x requires that x = (^)^. 

Our problem is that of finding the p.d.f. g(y) of the discrete type 
of random variable Y = AX. Now g(y) = Pr (F = ^). Because there is 
a one-to-one correspondence between the points of and those of 
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潘 , the event F = 少 or 4X = y can occur when, and only when, the event 
X = ( 5)7 occurs. That is, the two events are equivalent and have the 
same probability. Hence 

g(>0 = Pr (7 = >^) = Pr ^ ^ 丁 , ^ = 0, 4, 8,, 


= 0 elsewhere. 

The foregoing detailed discussion should make the subsequent text 
easier to read. Let Jfbe a random variable of the discrete type, having 
p.d.f./(x). Let denote the set of discrete points, at each of which 
/(x) > 0 , and let ^ = u{x) define a one-to-one transformation that maps 
onto 满 .If we solve y = u(x) for x in terms of y, say, x = >v( 少)， then 
for each 少 e 激 ， we have x = w{y) e s/. Consider the random variable 
Y = u(X). If 7 e 激， then x = e s^, and the events Y = y [or 
u(X) = 7] and X = w{y) are equivalent. Accordingly, the p.d.f. of Y is 

g(y) = Pr(7 = j) = Pr [X= w(^)] = /M j )]， ye^g, 

= 0 elsewhere. 


Example 1. Let X have the binomial p.d.f. 

r〆 ".、 3 


~(_r ， 


x!(3 

0 elsewhere. 


x = 0, 1 ， 2, 3, 


We seek the p.d.f. g(y) of the random variable Y = X 2 . The transformation 
y — m(x) — x 2 maps s/ = {x: x = 0 y l, 2, 3} onto 溪 ={y : 少 = 0, 1 ， 4, 9}. In 
general, y = x 1 does not define a one-to-one transformation; here, however, it 
does, for there are no negative values of x in j?/ = {jc : x = 0, 1, 2, 3}. That is, 
we have the single-valued inverse function x = wOO = v /y (not — v /y), and 
so 

giy) 

= 0 elsewhere. 


■A\fy) 


3! 


(v/5)! (3 - J~y)\ 



少 = 0, 1,4,9, 


There are no essential difficulties involved in a problem like 
the following. Let f{x u x 2 ) be the joint p.d.f. of two discrete-type 
random variables X x and X 2 with the (two-dimensional) set of 
points at which /(x,, x 2 ) > 0. Let y t = u t (x u x 2 ) and y 2 = u 2 (^i > ^ 2 ) 
define a one-to-one transformation that maps onto 劣 . The joint 
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p.d.f. of the two new random variables F, = Mi , X 2 ) and Y 2 = 
u 2 (X { , X 2 ) is given by 

g(yuyi) = w 2 (yr, y 2 )], (yi,y 2 )e^, 

= 0 elsewhere, 


where jc, = W| {y x , y 2 ), x 2 = w 2 ( 少 ,,j 2 ) is the single-valued inverse of 
y\ = Ui(x { , x 2 ), y 2 = m 2 (a ： i, a: 2 ). From this joint p.d.f. giyi,y 2 ) we may 
obtain the marginal p.d.f. of F, by summing on y 2 or the marginal 
p.d.f. of Y 2 by summing on y x . 

Perhaps it should be emphasized that the technique of change 
of variables involves the introduction of as many “new” variables 
as there were “old” variables. That is, suppose that/(jC|, jc 2 , jc 3 ) is the 
joint p.d.f. of X 2 , and X 3 , with s/ the set where/( jc i, jc 2 > 义 3 ) > 0. 
Let us say we seek the p.d.f. of F, = X 2 , X 3 ). We would then 

define (if possible) Y 1 = u 2 (Xi, X 2 , A" 3 ) and Y 3 = u 2 (Xi, X 2 , ^ 3 ), so 
that y x = y 2 = u 2 (x u x 2t jc 3 ), y 3 = a: 2 , x 3 ) define a 

one-to-one transformation of s/ onto This would enable us to find 
the joint p.d.f. of Y u Y 2 , and Y 3 from which we would get the marginal 
p.d.f. of K, by summing on y 2 and y 3 . 


Example 2. Let 不 and X 2 be two independent random variables that have 
Poisson distributions with means 川 and fi 2 , respectively. The joint p.d.f. of 
X\ and X 2 is 


X,! x 2 \ 


x, = 0, 1, 2, 3,..., x 2 = 0, 1,2, 3,..., 


and is zero elsewhere. Thus the space is the set of points (x,, x 2 ), where 
each of jc, and x 2 is a nonnegative integer. We wish to find the p.d.f. of 
Y { = A", + X 2 . If we use the change of variable technique, we need to define 
a second random variable Y 2 . Because Y 2 is of no interest to us, let us 
choose it in such a way that we have a simple one-to-one transformation. 
For example, take Y 1 = X 2 . Then y x = x 2 and y 2 = x 2 represent a 
one-to-one transformation that maps s/ onto 


^ = { 0 > i ， h)_h = 0 ， l ， ... ， _v l and y x = 0 , 1 , 2 ,.. 


Note that, if (j|, 少 2 ) e 劣 ， then 0 ^ . The inverse functions are given by 

jci = — y 2 and x 2 = y 2 - Thus the joint p.d.f. of Y t and Y 2 is 


^>1.^2) 


Ch _ 少2 )! 少 2 ! 


( 少 I ， 少 2 ) e 潘 , 
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and is zero elsewhere. Consequently, the marginal p.d.f. of Y\ is given by 


少 i 




乃！ 0 (^1 


■Ki! 

— h)! 少 2 ! 




~ y\ x - 


= 0， 1,2,. 


and is zero elsewhere. That is, Yi = X ，+ X 2 has a Poisson distribution with 
parameter /i, + /i 2 . 

Remark. It should be noted that Example 2 essentially illustrates the 


distribution function technique too. That is, without defining Y 1 = X 2 , we 
have that the distribution function of ^ = A" ，+ X 2 is 

G l (^ 1 ) = Pr(^ l + ^ 2 <>-,). 

In this discrete case, with >»| = 0, 1, 2,.. ., the p.d.f. of Y\ is equal to 
S'iCvi) = Gi(_Fi) — 1) = Pr ( X 2 — j^i). 


That is, 


8\(yi)= II ― 7TV1~~ - 

X| + X2 = y\ A i- A 2- 

i #! .•- 

This summation is over all points of si such that or, + x 3 = y y and thus can 
be written as 


^1(^1) = 2 - ~ / ~ ， 

x 2 = 0 CVi ― A)! 戈 2! 

which is exactly the summation given in Example 2. 

Example 3. In Section 4.1, we found that we could simulate a 
continuous-type random variable X with distribution function 八义 ） through 
X = F~'( 10, where Xhas a uniform distribution on 0 < < 1. In a sense, we 

can simulate a discrete-type random variable X in much the same way, but 
we must understand what X = F _l (y) means in this case. Here /l(x) is a step 
function with the height of the step at jc = jc 0 equal to Pr (A"= jc 0 ). For 
illustration, in Example 3 of Section 1.5, Pr (A" = 3) = | is the height of the 
step at jc = 3 in Figure 1.3 that depicts the distribution function. If we now 
think of selecting a random point Y, having the uniform distribution on 
0 < ^ ^ l, on the vertical axis of Figure 1.3, the probability of falling between 
§ and j is However, if it falls between those two values, the horizontal line 
drawn from it would “hit” the step at x = 3. That is, for ! < 盒 ， then 
F _l (^) = 3. Of course, if g < ^ ^ thenF _l (^) = 2; and if 0 < y < we have 
= •- Thus, with this procedure, we can generate the numbers x = 1, 
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x = 2, and x = 3 with respective probabilities 5 , and |, as we desired. 
Clearly, this procedure can be generalized to simulate any random variable 

,X of the discrete type. 

EXERCISES 

4.17. Let X have a p.d.f.y(x) = 3 , jc = 1, 2, 3, zero elsewhere. Find the p.d.f. 
of Y=2X+ I. 

4.18. If f{x u x 2 ) = (^.,^) = (0,0),(0, 1),(1,0),(1, 1), 

zero elsewhere, is the joint p.d.f. of and X 2 , find the joint p.d.f. of 
y, = ^，- X 1 and Y 2 = X ] + X 2 . * 

4.19. Let X have the p.d.f,/(x )= (臺广 x = 1 ， 2, 3, ... ， zero elsewhere. Find 
the p.d.f. of Y = X^. 

4.20. Let JTi and X 2 have the joint p.d.f. _/(;<:■ ， x 2 ) = x,x 2 /36, jc, = 1 ， 2 , 3 and 
x 2 = 1 ， 2, 3, zero elsewhere. Find first the joint p.d.f. of Y\ — X x X 2 and 

= and then find the marginal p.d.f. of Y t . 

4.21. Let the independent random variables X x and X 2 be b{n if p) and b(n 2 , p), 
respectively. Find the joint p.d.f. of Y t = X { + X 2 and Y 2 = X ly and then 
find the marginal p.d.f. of Y { . 

Hint: Use the fact that 



This can be proved by comparing the coefficients of ^ in each member of 
the identity (1 + jc)"'(1 + x)" 2 = (1 + xY l + n2 . 

4.22. Let X\ and X 2 be independent random variables of the discrete type with 
joint p.d.f./|(x,)^(X 2 ), (jC|, jc 2 ) e sd. Let>>| = «|(x,) and y 2 — u 2 {x 2 ) denote 
a one-to-one transformation that maps si onto 省 . Show that f = «|(^^) 
and Y 2 = u 2 (X 2 ) are independent. 

4.23. Consider the random variable X with p.d.f./(x) = jc/15, x = 1, 2, 3, 4, 
5, and zero elsewhere. 

(a) Graph the distribution function F{x) of X. 

(b) Using a computer or a table of random numbers, determine 30 values 
of Y, which has the (approximate) uniform distribution on 0 < j < 1. 

(c) From these 30 values of Y, find the corresponding 30 values of X and 
determine the relative frequencies of jc = l, x = 2, x = 3, x = 4, and 
x = 5. How do these compare to the respective probabilities of 

3 4 5 o * 

l5> l5> 15* 

4.24. Using the technique given in Example 3 and Exercise 4.23, generate 50 
values having a Poisson distribution with # = 1. 

Hint: Use Table I in Appendix B. 
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4.3 Transformations of Variables of the Continuous. Type 

In the preceding section we introduced the notion of a one-to-one 
transformation and the mapping of a set j/ onto a set 激 under that 
transformation. Those ideas were sufficient to enable us to find the 
distribution of a function of several random variables of the discrete 
type. In this section we examine the same problem when the random 
variables are of the continuous type. It is again helpful to begin with 
a special problem. 

Example 1. Let JT be a random variable of the continuous type, having 
p.d.f. 

f[x) = 2x, 0 < x < 1 , 

= 0 elsewhere. 


Here si is the space {jc : 0 < x < I}, where f{x) > 0. Define the random 
variable Y by Y = and consider the transformation y = 8 X 3 . Under 
the transformation y = 8 X 3 , the set si is mapped onto the set 3d = 
{少 ： 0 < 少 < 8 }， and, moreover, the transformation is one-to-one. For every 
0 < a < A < 8 , the event a < Y < b will occur when, and only when, the 
event \^fa < X < \^fb occurs because there is a one-to-one correspondence 
between the points of si and 38. Thus 

Pr(a < F< A) = Pr <X< \^/b) 


2x dx. 

^/2 


Let us rewrite this integral by changing the variable of integration from x to 
y by writing 少 = 8 JC 3 or x = \^fy. Now 


dx _ 1 

d^ = 6/ Fj, 


and, accordingly, we have 

Pr (a < y < *) 



d y 


Since this is true for every 0 < a < A < 8 , the p.d.f. g(y) of Y is the integrand; 
that is, 




6/ /3 


= 0 


0 < 少 < 8 , 

elsewhere. 
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It is worth noting that we found the p.d.f. of the random variable 
Y = 8I 3 by using a theorem on the change of variable in a definite 
integral. However, to obtain g ( 少 ） we actually need only two things: 
(1) the set 满 of points y where g(y) > 0 and (2) the integrand of the 
integral on y to which Pr(a < Y <b) is equal. These can be found 
by two simple rules: 


1. Verify that the transformation y = 8 jc 3 maps = {x\0 < x < 1} 
onto 邊 = { 少 ： 0 < 少 < 8} and that the transformation is one- 
to-one. 

2. Determine # 少 ） on this set 3S by substituting for x in f(x) 
and then multiplying this result by the derivative of \^[y. That 
is, 


g(y) =/ 


(f 


柳力] 

dy 




0 < j < 8, 


= 0 elsewhere. 

We shall accept a theorem in analysis on the change of variable in 
a definite integral to enable us to state a more general result. Let X be 
a random variable of the Continuous type having p.d.f. f(x). Let 
be the one-dimensional space where f(x) > 0. Consider the random 
variable Y = u()T), where jv = w(x) defines a one-to-one transformation 
that maps the set s/ onto the set 激 . Let the inverse of y = m(x) 
be denoted by x = vv(_v), and let the derivative dx/dy = w\y) be 
continuous and not equal zero for all pbints y in 3. Then the p.d.f. 
of the random variable Y = u(X) is given by 

g(y) = Rw{y)]\w\y )\, 少 e 涿， 

= 0 elsewhere, 

where 卜 represents the absolute value of w\y). This is precisely 
what we did in Example 1 of this section, except there we deliberately 
chose y = 8x 3 to be an increasing function so that 

Ty = W，{y) = ^' 0 < 少 < 8 ， 

is positive, and hence 
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Accordingly, the p.d.f. g{y) of Y = —2 In X is 

贫 (>0 =Ae~ yl 2 )\J\ = {e， n , 0 < oo, 

— 0 elsewhere, 

a p.d.f. that is chi-square with 2 degrees of freedom. Note that this problem 
was first proposed in Exercise 3.46. 

This method of finding the p.d.f. of a function of one random 
variable of the continuous type will now be extended to functions of 
two random variables of this type. Again, only functions that define 
a one-to-one transformation will be considered at this time. Let 
y x — M|(X|, x 2 ) and y 2 = ^ 2 (^ 1 , x 2 ) define a one-to-one transformation 
that maps a (two-dimensional) set s/ in the jC|X 2 -plane onto a 
(two-dimensional) set 激 in the_yi_y 2 -plane. If we express each of jc, and 
x 2 in terms of y { and y 2 , we can write x t = %( 少 ，，少 2 ), = w 2 (> , [, y 2 ). 
The determinant of order 2, 

dx t dx' 

W2 

dx z dx 2 ’ 

Sy\ dy 2 

is called the Jacobian of the transformation and will be denoted 
by the symbol J. It will be assumed that these first-order partial 


Henceforth, we shall refer to dxjdy = vv' ( 少 ） as the Jacobian (denoted 
by J) of the transformation. In most mathematical areas, J = vv' ( 少 ） .is 
referred to as the Jacobian of the inverse transformation x — w ( 少 )， but 
in this book it will be called the Jacobian of the transformation, simply 
for convenience. 

Example 2. Let X have the p.d.f. 

Ax) =1 ， 0 < x< l, 

= 0 elsewhere. 

We are to show that the random variable Y = —2 In A" has a chi- 
square distribution with 2 degrees of freedom. Here the transformation 
is ^ = «(x) = — 2 In jc ，so that x — j) = e~ yl2 . The space is s/= 
{jc : 0 < Af < 1 }, which the one-to-one transformation y = —2 In x maps onto 
= {y:0 < y < 00}. The Jacobian of the transformation is 


/2 

112 

I 

一 1 

,vv' 

II 


II 
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derivatives are continuous and that the Jacobian J is not ident¬ 
ically equal to zero in 强 . An illustrative example may be 
desirable before we proceed with the extension of the change of 
variable technique to two random variables of the continuous 
type. 

Example 3. Let be the set si = {(a:,, a: 2 ): 0 < oc, < 1,0 < oc 2 < U 
depicted in Figure 4.1. We wish to determine the set 劣 in the ^,^-plane that 
is the mapping of si under the one-to-one transformation 

y\ = = x, +x 2 , 

yi = « 2 ( 太 i ， a ) = x\ - x lt 

and we wish to compute the Jacobian of the transformation. Now 

A = = 4(乃 +_y 2 )， 

a = M_Vi ， 少 2> = |( 少 | D. 

To determine the set 溪 in the 少 i_v 2 -plane onto which si is mapped under the 
transformation, note that the boundaries of j/ are transformed as follows into 
the boundaries of 


jc, = 0 

into 

0 = 


文 i = 

=1 

into 

1 = 

= 

+ _ V 2)， 

X 2 = 

= 0 

into 

0 = 士 Oi 

一少2)， 

Xi = 

=1 

into 

1 = 

= 2(^1 

一少 2). 


x i 


k 

jt 2 = 1 



X \ = x 


( 0. 0 ) Jr 2 =0 


FIGURE 4.1 
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>2 



Accordingly, 3S is shown in Figure 4.2. Finally, 


dx y 

办 I dy 2 
dx 2 dx 2 

Wi 


2 2 

]_ _l 
2 ~2 


2 


Remark. Although, in Example 3, we suggest transforming the bound¬ 
aries of si, others might want to use the inequalities 


0 < JC| < 1 and 0 < x 2 < 1 

directly. These four inequalities become 

0 < 如 + 少 2 )< 1 and 0 <^(^, ->^ 2 )< 1 . 

It is easy to see that these are equivalent to 


-yy < yz, ^2 <2 - y,, y^<yu : Vi — 2< 乃； 

and they define the set 3S. In this example, these methods were rather simple 
and essentially the same. Other examples could present more complicated 
transformations, and only experience can help one decide which is the best 
method in each case. 


We now proceed with the problem of finding the joint p.d.f. of 
two functions of two continuous-type random variables. Let A", and A" 2 
be random variables of the continuous type, having joint p.d.f. 
h(x l ,x 2 ). Let si be the two-dimensional set in the jc,x 2 -plane where 
h(x li x 2 ) > 0. Let Y { = Ui{X { , X 2 ) be a random variable whose p.d.f. 
is to be found. If 乃 = x 2 ) and y 2 = u 2 {x \, x 2 ) define a one-to- 
one transformation of s/i onto a set 激 in the 少 ,j； 2 -plane (with 
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nonidentically zero Jacobian), we can find, by use of a theorem in 
analysis, the joint p.d.f. of X 2 ) and Y 2 = u 2 (X ] , X 2 ). Let 

^4 be a subset of s4, and let B denote the mapping of A under the 
one-to-one transformation (see Figure 4.3). The events (X u X 2 ) e A 
and (l^i, Y 2 ) e B are equivalent. Hence 

Pr[(y„ Y 2 )eB] = ?r[(X iy X 2 )eA] 

= A(jC|, jc 2 ) dx { dx 2 . 

A 

We wish now to change variables of integration by writing y x = 
少 2 = « 2 ( 文 I ，文 2 ) ， orX| = W l (y i ,y 2 ),x 2 = w 2 (y l ,y 2 ). It has been 
proved in analysis that this change of variables requires 

f /% f /o 

h(x u x 2 )dx ] dx-t = h[w x {y x ,y 2 \ w 2 (y t , 少 2 )]M 办 i 办 2. 

V V «/ * 

A B 

Thus，for every set B in 

"Pr[(y,, ^ 2 ) 6 ^] = h[w t (y,, y 2 ), w 2 (j,,>； 2 )]|71 dy x dy 2 , 

V K 

B 

which implies that the joint p.d.f. g(y u y 2 ) of T, and Y 2 is 
g{y\,yi) = A [ 沙 I(J| ， 少 2 ) ， W 2 (y u y 2 )]\J\, ( 少 ，，少 2 ) e 激， 

= 0 elsewhere. 

Accordingly, the marginal p.d.f. g[(ji) of Y] can be obtained from the 
joint p.d.f. g{y\,y 2 ) in the usual manner by integrating on y 2 . Several 
examples of this result will be given. 

Example 4. Let the random variable X have the p.d.f. 

yw = i ， 

= 0 




FIGURE 4.3 


0 < x < 1, 

elsewhere, 

>2 
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and letU 2 denote a random sample from this distribution. The joint p.d.f. 
of X\ and is then 

x 2 ) = 1, 0 < X, < 1, 0 < x 2 < 1, 

= 0 elsewhere. 

Consider the two random variables Y y = X t + X 2 and Y 2 = — X 2 . We wish 

to find the joint p.d.f. of Y y and Y 2 . Here the two-dimensional space s 4 in the 
X|X 2 -plane is that of Example 3 of this section. The one-to-one transfor¬ 
mation >>,=jc, + x 2 , yi — — x 2 maps onto the space 潘 of that example. 

Moreover, the Jacobian of that transformation has been shown to be / = —5. 
Thus 

f ( 少 _ ，少 2) = + h) ， 士 (_Vi - h)FI 

= f [[{ y \ + 夕 2 )]/[去(少 I — 少 2)]1 和士，(少 1 ，少 2 ) e 淡， 

= 0 elsewhere. 

Because SS is not a product space, the random variables Y t and Y 2 are 
dependent. The marginal p.d.f. of Y t is given by 

广 CO 

贫 iCvi)= g(y\>yi)dy 1 . ' 

— oo 

If we refer to Figure 4 . 2 , it is seen that 

m y\ 

giM = idy 2 =y,, 0<>-! < l, 

J —少， 

1*2-y, . 

■ = \dy 2 = 2 -y x , 1< 少 ,<2 ， 

」少 I -2 

= 0 elsewhere. 

In a similar manner, the marginal p.d.f. ^2(^2) is given by 

•少 2 + 2 

容2(少2) = [ dy y = y 2 + 1, 一 1 < 少 2 么0， 

少2 

，2 -少2 

= \dy x =1-^2, 0< 少 2 <1 ， 

」少2 

= 0 elsewhere. 

Example S. Let A",, A" 2 be a random sample of size n = 2 from a stan¬ 
dard normal distribution. Say that we are interested in the distribution 
of K, = XJX 2 . Often in selecting the second random variable, we use 
the denominator of the ratio or a function of that denominator. So let 
Y 2 = X 2 . With the set {(x,, x 2 ) : — oo < x t < oo, — oo < x 2 < 00}, we note 
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that the ratio is not defined at x 2 = 0. However, Pr (X 2 = 0) = 0; so we take 
the p.d.f. of X 2 to be zero at x 2 = 0. This results in the set 

— {(^ 1 » X 2 ) : — oo < ^1 < — oo < jc ：2 < 0 or 0 < x 2 < oo}. 

With >», = x t lx 2 , yi = x 2 or, equivalently, x, = ^2 = yi, ^ maps onto 


Si = {(^i, ^ 2 ) : — 00 < .y, < 00 , — oo<_y 2 <0 or 0 <>» 2 <oo}. 


Also, 


yi y t 

0 1 


h # 0 . 


Since 

^ 1 ,^ 2 ) = 2 ^ exp — 圣 (xf + 4) 

we have that the joint p.d.f. of Y, and Y 2 is 


^ 1 ,^ 2 ) = 2 ^ exp 


2 


^2( 1 +y]) 


(x, ， x 2 ) ei ， 




Thus 


产 0 


容 iCvi ) 


giyt,y2)dy 2 


giy\,y 2 )dy 2 . 


Since g(>»i ， _v 2 ) is an even function of y 2 , we can write 


容 1CV1) = 2 


2n 


exp 


2 


y 2 (i + w ) 


1 f-exp[-|^(l + >-?)]] 


71 


+ 片 


(^ 2 ) (fyl 


«(i + y\) 


00 <_V| < 00 . 


This marginal p.d.f. of y, = XJX 2 is that of a Cauchy distribution. Although 
the Cauchy p.d.f. is symmetric about y x = 0, the mean does not exist because 
the integral 


\y\\g\{y\)dy x 


does not exist. The median and the mode, however, are both equal to zero. 

Example 6. Let y, ={{X\ — X 2 ), where X t and X 2 are i.i.d. random 
variables, each being x 2 (2). The joint p.d.f. of X, and X 2 is 

介 、/r 、 1 ( X| + 

•/l>il^2) = 4exp(- 


2 


0 < X| < 00 , 0 < x 2 < 00 , 


= 0 elsewhere. 
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Let 5*^2 = *^2 sq that 少 I = I ( 文！ 一 文 2) ， 少 2 = 文 2 or 太 1 = 2 少 i + 少 2 ，太 2 = 少 2 define 

a one-to-one transformation from jj/ = {(jc,, x 2 ): 0 < x, < 00, 0 < x 2 < 00} 
onto 激 = {( 少 | ， 少 2): —2^, < y 2 and 0 〈少 2 , — 00 〈少 【 < oo}. The Jacobian of 
the transformation is 


2 

0 



hence the joint p.d.f. of Y { and V 2 is 

121 

^ 1 ,^ 2 ) = -^e~y'~y\ (y lt y 2 )e^, 

= 0 elsewhere. 

Thus the p.d.f. of Y t is given by 

*00 

容 1 ( 少 1 )= > 一少 * 一少 2 办 2 = 士#， 一 00 〈少 , < 0 , 

*00 

’= I £-y\ -yi < iy 1 — ^ 0 < < 00, 

« « 
J o . 


or 


及 i( 少 1) = h， 11 , -00 <y,<oo. 


This p.d.f. is now frequently called the double exponential p.d.f. 


Example 7. In this example a rather important result is established. Let 
Xi and Xj be independent random variables of the continuous type with joint 
p.d.f. /i(x i )/' 2 (jc 2 ) that is positive on the two-dimensional space si. Let 
Y t = a function of X { alone, and Y 2 = u 2 (X 2 ), a function of X 2 alone. 

We assume for the present that y t = M|(jf|), y 2 = u 2 (x 2 ) define a one-to-one 
transformation from onto a two-dimensional set M in the ^|^ 2 -plane. 
Solving for JC| and x 2 in terms of and^, wehavexi = w ■(少 Jandj^ = w 2 0 > 2 )， 
so 


心） 0 

0 W2(y 2 ) 




Hence the joint p.d.f. of Y ] and Y 2 is 

办 1 ， 少 2 ) =/i[w i OO]/ 2 [WjCv 2 }]|w;Cv_)w^v 2 )|， (少 1 ，少 2 ) e 潘， 
= 0 elsewhere. 


However, from the procedure for changing variables in the case of 
one random variable, we see that the marginal probability density 
functions of Y { and Y 1 are, respectively, gi(^i) = / 1 [^!(^|)]|^(^|)| and 
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? 2 ( 少 2 ) = /2[ 州 2( 少 2)]14( 少 2 )1 for y, and 少 2 in some appropriate sets. Con¬ 
sequently, 

g(yt, yi) = gi(yi)g2(yil 

Thus, summarizing, we note that if and X 2 are independent random 
variables, then Y\ = Ui(X } ) and K 2 = m 2 (A" 2 ) are also independent random 
variables. It has been seen that the result holds if X x and X 2 are of the discrete 
type; see Exercise 4.22. 

In the simulation of random variables using uniform random 
variables, it is frequently difficult to solve y = for x. Thus other 
methods are necessary. For instance, consider the important normal 
case in which we desire to determine X so that it is A^(0, 1). Of course, 
once X is determined, other normal variables can then be obtained 
through X by the transformation Z = aX + /i. 

To simulate normal variables. Box and Muller suggested the 
following procedure. Let Y,, Y 2 be a random sample from the uniform 
distribution over 0 〈少 < 1. Define X x and X 2 by 

^ = ( —21n y,) l/2 cos (2nY 2 ), 

X 2 = (-2\n y,)' /2 sin (2nY 2 ). 

The corresponding transformation is one-to-one and maps 
{(>^>^):0 < 力 < 1 ， 0 <y 2 <l} onto {(x,, x 2 ):-co < X, < oo, 
— oo < jc 2 < oo} except for sets involving x, = 0 and x 2 = 0 , which 
have probability zero. The inverse transformation is given by 
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Since the joint p.d.f. of K, and y 2 is 1 on 0 < 少 i < 1， 0 < < 1, and 

zero elsewhere, the joint p.d.f. of and X 2 is 

l x] + x\ 
exp( — 

- Z- -- , — 00 < ^! < 00, —oo < x 2 < oo. 

Z7T 

That is, X\ and X 2 are independent standard normal random variables. 

We close this section by observing a way of finding the p.d.f. 
of a sum of two independent random variables. Let and X 2 
be independent with respective probability density functions /,(:t|) 
and / 2 (x 2 ). Let y, =r A", + X 2 and Y 2 = X 2 . Thus we have the 
one-to-one transformation a:, = j；, — y 2 and x 2 = yi with Jacobian 
J — 1 . Here we say that = {(jc, ， ； c 2 ): — oo <x y < oo, —oo < x 2 < od} 
maps onto 劣 ={ ( 少 ■ ， 少 2 ): — oo < 少 ， < oo ，一 oo< 少 2 <oo}，but we 
recognize that in a particular problem the joint p.d.f. might equal zero 
on some part of these sets. Thus the joint p.d.f. of F| and V 2 is 

容 (ji,h) =f\{y\ - y 2 )fi{yi), (Ji ， h)e 劣， 
and the marginal p.d.f. of = X { + X 2 is given by 

广 00 

^i(ji) = f\iy\ - y 2 )f2( ： y2) 办 2, 

J- oo 

which is the well-known convolution formula. 



EXERCISES 

4.25. Let X have the p.d.f. f(x) = a^/ 9, 0 < x < 3, zero elsewhere. Find the 
p.d.f. of y = X\ 

4.26. If the p.d.f. of X is/fx) = 2xf _jrI , 0 < x < oo, zero elsewhere, determine 
the p.d.f. of y = 

4.27. Let X have the logistic p.d.f. J[x) = e~ x /(] + e~ x ) 2 , — oo < x < oo. 

(a) Show that the graph of f{x) is symmetric about the vertical axis through 
a ： = 0. 

(b) Find the distribution function of X. 

(c) Find the p.d.f. of y = e~ x . * 

(d) Show that the m.g.f. M{t) of is T(1 — ,)r(l + /), — 1 < / < 1. 
Hint: In the integral representing M{t), let .V = (1 + 
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4.2& Let X have the uniform distribution over the interval (—n/2, njl). 
Show that Y = tan X has a Cauchy distribution. 

4.29. Let X\ and Xi be two independent normal random variables, each 
with mean zero and variance one (possibly resulting from a Box—Muller 
transformation). Show that 

Z| = 

Z 2 = "2 + P a 1^\ + 0*2-\/l — P 2 

where 0 < <r,, 0 < and 0 < p < 1, have a bivariate normal distribution 
with respective parameters fi t , fi 2 , a\, a\, and p. 

4.30. Let and X 2 denote a random sample of size 2 from a distribution that 
is N(n t a 2 ). Let r, = JIT, + X 2 and Y 2 = X l - X 2 . Find the joint p.d.f, of y, 
and Y 2 and show that these random variables are independent. 

4.31. Let and X 2 denote a random sample of size 2 from a distribution that 

is N(ji, a 2 ). Let Y t = + X 2 and Y 2 = X t ■{- 2X 2 . Show that the joint p.d.f. 

of Y y and Y 2 is bivariate normal with correlation coefficient 3/^/10. 

4.32. Use the convolution formula to determine the p.d.f. of = X t + X 2 , 
where X x and X 2 are i.i.d. random variables, each with p.d.f. f{x) = e~ x , 
0 < x < 00 , zero elsewhere. 

Hint: Note that the integral on y 2 has limits of 0 and y u where 
0 < < cxd. Why? 

4.33. Let JIT, and X 2 have the joint p.d.f. h{x u x 2 ) = 2e~ Xl ~ X2 , 
0 < x t < x 2 < 00 , zero elsewhere. Find the joint p.d,f. of y, = 2X t and 
Y 1 = X 1 — and argue that y, and Y 1 are independent. 

4.34. Let Xi and X 2 have the joint p.d.f. h(x t , x 2 ) = 8 x,jc 2 , 0 < x, < < 1, 

zero elsewhere. Find the joint p.d.f. of Y x = XJX 2 and Y 2 = X 2 and argue 
that y, and Y 2 are independent. 

Hint: Use the inequalities 0 < y t y 2 < 少 2 < 1 in considering the mapping 
from j/ onto 激 . 

4.4 The Beta, and F DistrUnitions 

It is the purpose of this section to define three additional 

distributions quite useful in certain problems of statistical inference. 

These are called, respectively, the beta distribution, the (Student’s) 

， - distribution, and the F-distribution. 
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The beta distribation. Let X, and X 2 be two independent random 
variables that have gamma distributions and joint p.d.f. 


h{x y ,x 1 ) = 


mw ) 


x < \~ x xl~ x e~ Xy ~ Xl , 0<JCi < oo, 0 <^ 2 <°°» 


zero elsewhere, where a > 0, p > 0. Let ^ = A* - , 4- X 2 and Y 2 = 
X x j{X^ + X 2 ). We shall show that and Y 2 are independent. 

The space si is, exclusive of the points on the coordinate axes, the 
first quadrant of the jc,jc 2 -plane. Now 


y\ = 

yi = 


Ml (^ 1 ,^ 2 ) 


X\ + x 2 , 

又 I 


may be written X] 


A：, + X 2 
^ 1 ^ 2 , x 2 = y t ^ - y 2 \ so 


yi 

~ y-i 


一 少 | 


-yi # 0 . 


The transformation is one-to-one, and it maps s/ onto 3S = 
{(•Vi ， ^ 2 ) ： 0 < >»| < 00 , 0 < >» 2 < 1} in the 少 ih-plane. The joint p.d.f. 
of y, and Y 2 is then 

^1^2) = (y\) ~ 1 l>i ( 1 -y 2 )f~'e~ yi 


mm 

f ( l-w 


0<y x <co, 0< 少 2 <1 ， 


r(a)r ⑻ 

= 0 elsewhere. 

In accordance with Theorem 1, Section 2.4, the random variables are 
independent. The marginal p.d.f. of Y 2 is 


^ 2 (^ 2 ) 


mm 


r(a + P) 


A - I 


yi^-'e-y'dy, 


( i _ W ’， 0<少 2 <1, 


r(a)W 

= 0 elsewhere. 

This p.d.f. is that of the beta distribution with parameters a and 丨 Since 
容 ( 少 i,h ) 三 gi ( 油 2 (h), it must be that the p.d.f. of K, is 


宮 1CV1) 


r(a + p) 

0 elsewhere 


+ ^~'e~ y ', 0 < ^, < 00 , 
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which is that of a gamma distributioh with parameter values of 
a + 芦 and 1 . 

It is an easy exercise to show that the mean and the variance of 
Y 2 , which has a beta distribution with parameters a., and 於， are, 
respectively, 

_ a 2 __ ， 

M = a + r a = (a + i?+ l)(a + ^) 2- 

The r-dlstribudoa. Let W denote a random variable that is N(0, 1); 
let V denote a random variable that is z 2 ( r ); and let W and V be 
independent. Then the joint p.d.f. of W and V, say h(w f v) t is the 
product of the p.d.f. of W and that of V or 

h(w v) = —— e ~^ 12 --- r \f ll ~ 'e ~ vl2 

{ } r_2"〆 e ， 

— oo < w < oo, 0 < t; < oo, 

= 0 elsewhere. 

J . 4 .； 

Define a new random variable T by writing 


W 

The change-of-variable technique will be used to obtain the p.d.f. gi(/) 
of T. The equations 

k — v 


^Tr 


and 


define a one-to-one transformation that maps s/ = {(w, i;): — oo < 
w < oo, 0 < y < oo} onto 2 — {(?, u): — oo < t < oo, 0 < u < oo}. 
Since w = ty/uj^/r, i; j= m, the absolute value of the Jacobian of the 
transformation is |7| = y/uj^/r. Accordingly, the joint p.d.f. of T 
and U = V is given by 


g(t, u) = h[ t -^- t u]\J\ 


r(r/2)2 rl2 


if 12 ~ 1 exp 


,u 

2 


1 + 




— oo < / < oo, 0 < w < oo, 


= 0 elsewhere. 



182 


Distributions of Functions of Random Variables [Ch. 4 


The marginal p.d.f. of T is then 


^i (0 


g(t, u) du 


-00 


1 


M (r+ 1)/2 - I g X p 


U 

2 


1 +- 


Jo y/2^r r(r/2)2 rl2 

In this integral let z = «[1 + (t 2 /r)]/2, and it is seen that 

.(r+D/2 


du. 


幻⑺ 


1 


y/2nrr(r/2)2 rl2 V + 


㈣ 聲 


— r[(r + 1)/2] 1 

~ nr/2) 0 +W + ,)/2> 


— 00 < / < oo. 


Thus, if W is N(Q ， 1), if V is / 2 (r), and if W and V are independent, 
then 


T __W_ 

~^Jv\r 

.''it - 

has the immediately preceding p.d.f. 发 ，⑺. The distribution of the 
random variable T is usually called a t-distribution. It should 
• be observed that a /-distribution is completely determined by the 
parameter r, the number of degrees of freedom of the random variable 
that has the chi-square distribution. Some approximate values of 

it 

Pr (7* < /) = ^i(w) dw 

.J-oo 

for selected values of r and t can be found in Table IV in Appendix B. 

Remark. This distribution was first discovered by W. S. Gosset when he 
was working for an Irish brewery. Because that brewery did not want other 
breweries to know that statistical methods were being used, Gosset published 
under the pseudbnym Student. Thus this distribution is often known as 
Student’s /-distribution. 

The /^-distribution. Next consider two independent chi-square 
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random variables U and V having r, and r 2 degrees of freedom, 
respectively. The joint p.d.f. h(u, v) of U and V is then 

1 


/i(m, y) 


r(r,/2)r(r 2 /2)2( r i + r 〜 2 


t/i l2-\ v r 2 H-\ e -(u + v)l2^ 

0 < M < 00 , 0 < I ； < 00 , 


= 0 elsewhere. 

We define the new random variable 


W 


U/r } 

Vlr 2 


and we propose finding the p.d.f. g,(w) of W. The equations 


u/ri 

W 2 


z 


V, 


define a one-to-one transformation that maps the set s/ = 
{(«, y) : 0 < « < oo,0 < t; < oo} onto the set 劣 ={(w, z):0 <w < 00, 
0 < z < 00}, Since u = (/ , 1 /r 2 )zw, v = z, the absolute value of the 
Jacobian of the transformation is |J| — (r,/r 2 ) 2 . The joint p.d.f. g{w, z) 
of the random variables W and Z = V is then 


g(w, z) 


r,z»v 


^1/2- 


r(r 】 /2)r(r 2 /2)2( r i + M/ 2 、 r 2 


fia- 


x exp 


z[r x w 

2 U 


■2 


r-i 


provided that (w, z) e 39, and zero elsewhere. The marginal p.d.f. 
of W is then 




z) dz 

0 

r(r_/ 2 )r(r 2 / 2 ) 妒 mV2 


x exp 


z (r } w 

2(17 


dz. 


If we change the variable of integration by writing 


y 


z r t w 

2{~^ 


1 ， 
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it can be seen that 


^i(w)= 


(A-|/r 2 ) r ' /2 (>v) r ^ 2 -* ( 2y \ (r，+r2)/2_1 

r(r,/2)r(r 2 /2)2 (r ' + r 2^ + 1 ) 


X (r,W^2 + l) dy 

r[(r, 4 - r 2 )/2](r,/r 2 ) f '/ 2 «， 2 :丨 

r(r, /2)r(r 2 /2) (1 + r,»v/r 2 )< r * + ^ 2 ， 


0 < w < oo, 


= 0 elsewhere. 


Accordingly, if U and V are independent chi-square variables with 
r, and r 2 degrees of freedom, respectively, then 

V\r r 

has the immediately preceding p.d.f. The distribution of this 
random variable is usually called an F-distribution\ and we often call 
the ratio, which we have denoted by W, F. That is, 

V!r{ 

It should be observed that an F-distribution is completely determined 
by the two parameters r, and r 2 . Table V in Appendix B gives some 
approximate values of 

fh 

Pr (F<,b)= g\(w)dw 
for selected values of r,, r 2 , and b. 


EXERCISES 


4.35. Find the mean and variance cf the beta distribution. 
Hint: From that p.d.f., we know that 


广 l (l -yf~ x dy 


mm 

na+P) 


for all a > 0, ^ > 0. 

4.36. Determine the constant c in each of the following so that each J{x) is 
a beta p.d.f. 

(a) J\x) = cx(] — jc) 3 , 0 < jc < 1, zero elsewhere. 



Sec. 4.4| The Beta^ r, and F Distribmtioas 185 

(b) f{x) = <^( 1 . 一冰 ， 0 < jc < 1 , zero elsewhere. 

(c) y(x) = 0^(1 — x) 8 , 0 < x < 1, zero elsewhere. 

4.37. Determine the constant c so that j{x) = cx(3 — x) 4 , 0 < x < 3 , zero 
elsewhere, is a p.d.f. 

4.38. Show that the graph of the beta p.d.f. is symmetric about t)ie vertical 
line through a: = 5 if a = 

4.39. Show, for k = 1, 2,..., n, that 

>p (/c-DKn — 幻 〆 _ 1 _ Z 广％ = X ㈡ 

This demonstrates the relationship between the distribution functions of the 
beta and binomial distributions. 

4.40. Let T have a /-distribution with 10 degrees of freedom. Find Pr (| T\ > 
2.228) from Table IV. 

4.41. Let T have a /-distribution with 14 degrees of freedom. Determitie 
so that Pr (—ft < T < b) = 0.90. 

4.42. Let F have an F-distribution with parameters r, and r 2 . Prove that l/F 
has an F-distribution with parameters r 2 and r,. 

4.43. If F has an /"-distribution with parameters r, = 5 and r 2 = 10, find a 

and b so that Pr (F< a) = 0.05 and Pr (F^b) = 0.95, and, accordingly, 
Pr (a < F< A) = 0.90. v. 

Hint: Write Pr (F ^ a) = Pr (l/F . 之 1/a) = 1 — Pr (l/F 1 /a), and use 
the result of Exercise 4.42 and Table V. 

4.44. Let T = Wjy/vfr, where the independent variables W and V are, 
respectively, normal with mean zero and variance 1 and chi-square with r 
degrees of freedom. Show that T 1 has an F-distribution with parameters 
ri = 1 and r-i — r. 

Hint: What is the distribution of the numerator of r 2 ? 

4.45. Show that the f-distribution with r — 1 degree of freedom and the 
Cauchy distribution are the same. 

4.46. Show that 





where W has an F-distribution with parameters r t and r 2 , has a beta 
distribution. 

4.47. Let A",, A" 2 be a random sample from a distribution having the p.d.f. 
/(a:) = e~ x , 0 < x < 00 , zero elsewhere. Show that Z = X\fX 2 has an 
/■-distribution. 
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not be identically zero in 激 . Then 
… J h(x t , jc 2 , . .., jc„) dx' dx 2 ' ■ - dx„ 

A 

=... ... ,y„), w 2 (yy,. ■ ■，>0 . ... ，少 „)] 


4.5 Extensions of the Change-of-Variable Technique 

In Section 4.3 it was seen that the determination of the joint p.d.f. 
of two functions of two random variables of the continuous type was 
essentially a corollary to a theorem in analysis having to do with the 
change of variables in a twofold integral. This theorem has a natural 
extension to n-fold integrals. This extension is as follows. Consider an 
integral of the form 

… h(x { , x 2 ,..., x„) dx' dx 2 - - - dx„ 

v « 

A 

taken over a subset A of an n-dimensional space Let 

y \ = 々， ■. • ， a )， y 2 = u 2 ( x „ x 2 , ••.，')，•••， 

y« = “nC^I，• • ■ ， -^n)» 

together with the inverse functions 

= w l (y if y 2 ,. . • ，少")， x 2 = w 2 {y^y 2y . . . ， y „)，■ •., 

x n = w H iyy , yi , ■■ - , y n ) 

define a one-to-one transformation that maps ^ onto in the 
少 I ，乃 ， • ■ ■ ， 少 " space (and hence maps the subset A onto a subset 
B of Let the first partial derivatives of the inverse functions be 
continuous and let the nby n determinant (called the Jacobian) 




11 1 21 — n - I 
xyJcy • r JCy 


B 


x |7| dy x dy 2 ■■- dy„. 
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Whenever the conditions of this theorem are satisfied, we can deter “ 
mine the joint p.d.f. of n functions of n random variables. Approp¬ 
riate changes of notation in Section 4.3 (to indicate n-space as opposed 
to 2-space) are all that is needed to show that the joint p.d.f. of the 
random variables Y x = Ui(Xi,X 2 , ..., X„), Y 2 = u 2 (Xi,X 2 , ..., 

...,Y„ = u„(X u A",,)—where the joint p.d.f. ofX l ,X 2 ,..., X„ 

is is given by 

g(y\,yi, ... ， _v") = .. -, y n ), •. • ， w n (y { ,... ，少" )]， 

when ( 少 | ， .. • ，少 „) e 涿 ， and is zero elsewhere. 

Example 1 . Let H …， + , be independent random variables, each 
having a gamma distribution with = 1 . The joint p.d.f. of these variables 
may be written as 

I * +1 j 

x 2t .. ,,x k+i )= n 0 < x, < 00, 

= 0 elsewhere. 

Let 

Yi = x t + x 2 + ■■■ + X k+ r i=l ’ 2 ’ …， k ’ 

and Y k +1 = X\ + X 2 + • • • + X k + { denote k + 1 new random variables. The 
associated transformation maps s/ = {(j:,, ... ， ,): 0 < x; < oo, i = 1, 
...,k + 1} onto the space 

薄 = {Oi ， … ， h，h + 1) : 0 < 乃 ， /• = 1 ， … ， A :， 

少 I + … + 八 < 1， 0 < 少 “ I < oo}. 

The single-valued inverse functions are x, = y]y k + \> ■ • •, x k = y k y k + \, 
x k + i =^*+ i(l — jVi — - y k ), so that the Jacobian is 

y* + i 0 ... 0 y t 

o ^*+1 ••- o y 2 

• * _ • fr 

: : : ' : =>> + \ - 

0 0 … y k+i y k 

— _v*+i — jv*+i —yk +1 (1 — 少1 — … 一、 

- 二 

Hence the joint p.d.f. of F,,..., Y kf y* + , is given by 

1 .•. 少严- 1 (1 -jy, - - ^ e -yk + \ 

r(a,) • - .r(a*)r(a* +l ~ 
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provided that ( 少 ...,y kl y k+i )e^ and is equal to zero elsewhere. The joint 
p.d.f. of Y .. Y k is；seen by inspection to be given by 


giyi,- - = r r^'^t{a k k ^) 1 — 少 * 严 + r _\ 

when 0 /= 1 ,..., k, y, + • • -+ ^* < 1, while the function g is equal to 


zero elsewhere. Random variables Y { ,..., Y k that have a joint p.d.f. of this 
form are said to have a.Dirichlet distribution with parameters a,,..., a*, a* + ,, 
and any such g(y } ， … ，少 *) is called a Dirichlet p.d.f. It is seen, in the special 
case of A: = 1, that the Dirichlet p.d.f. becomes a beta p.d.f. Moreover, it is 
also clear from the joint p.d.f. of K,,..., Y k , y*+, that Y k+ , has a gamma 
di^ribution with parameters ai + . • • + ot* + ot* + , and jS = 1 and that Y k + , 
is independent of Y x , Y 2 ,..., Y k . 


We now consider some other problems that are encountered when 
transforming variables. Let X have the Cauchy p.d.f. 




— OO < X < CO, 


and let Y = X 2 . We seek the p.d.f. g(>^) of Y. Consider the 
transformation y = x 1 . This transformation maps the space of 
X, s/ = {x: —co < x < oo}, onto = {y :0 < y < cxd}. However, 
the transformation is not one-to-one. To each 劣 ， with the 
exception of = 0, there correspond two points xe For example, 
if 少 = 4, we may have either x = 2 or x = —2. In such an instance, 
we represent si as the union of two disjoint sets A t and A 2 such that 
y = x 2 defines a one-to-one transformation that maps each of A x 
and A 2 onto ^S. If we take ,4, to be {x: — oo < x < 0 } and A 2 to be 
{jc : 0 < x < oo }， we see that A, is mapped onto [y: 0 < jr< oo}, 
whereas A 2 is mapped onto .{■ 少： 0 S < oo}, and these sets are not the 
same. Our difficulty is caused by the fact that x = 0 is an element 
of s/. Why, then, do we not return to the Cauchy p.d.f. and take 
7(0) = 0? Then our new is = { — oo < x < oo but x ^ 0}. We 
then take ^4, = {x : — oo < x < 0} and ^4 2 = {x : 0 < x < oo}. Thus 
y = x 1 , with the inverse x = — yjy, maps A , onto ^ = {_y: 0 <_v<oo} 
and the transformation is one-to-one. Moreover, the transformation 
y = x 2 , with inverse x = y/y, maps A 2 onto ^ = {y: 0 < y < oo} 
and the transformation is one-to-one. Consider the probability 
Pr (y € B), where c 激 .Let A 3 = {x: x = —y/y, y^B} c A, and 
let A 4 = {x : x = y/y, yeB) c ： A 2 . Then Ye B when and only when 
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XeA 3 or XeA A . Thus we have 

~ Pr (Ke 5) = Pr (JTg A,) + ?r{Xe A a ) 

• /• 

=/(x) dx + f{x) dx. 

In the first of these integrals, let x = — J~y. Thus the Jacobian, say J,, 
is —\l2y/y\ moreover, the set A 3 is mapped onto B. In the second 
integral let x = y/y. Thus the Jacobian, say J 2 , is 1/2^/y; moreover, 
the set A a is also mapped onto B. Finally, 


?r(YeB)= A- 


M-^y\ dy + [ A ^y dy 


1 /( + K\/y)} 


Hence the p.d.f. of Y is given by 


g{y) = ^-j=uK-y/y) ^ 

With y(x) the Cauchy p.d.f. we have 




g(y) = - 1=, 0<y<cc, 

+y)y/y 

= 0 elsewhere. 


In the preceding discussion of a random variable of the continuous 
type, we had two inverse functions, x = —y/y and x = y/y. That is 
why we sought to partition (or a modification of s/) into two disjoint 
subsets such that the transformation y = x 2 maps each onto the same 
潘 . Had there been three inverse functions, we would have sought to 
partition js/ (or a modified form of js/) into three disjoint subsets, and 
so on. It is hoped that this detailed discussion will make the following 
paragraph easier to read. 

Let h(x u x 2 ,..., x n ) be the joint p.d.f. of X y , X 2 ,..., X n , which 
are random variables of the continuous type. Let ^ be the 
n-dimensional space where h(x' ， x 2 , ..:, jc„) > 0, and consider the 
transformation y t = x 2t .. ., x„), y 2 = u 2 (x } , x 2 ,..., x n ),..., 
y„ = u„(jc, , x 2 , ..., x„), which maps onto 激 in the 少 1 ， 少 2 , •. • ，少 ” 
space. To each point of js / there will correspond, of course, but one 
point in 激 ； but to a point in there may correspond more than one 
point in That is, the transformation may not be one-to-one. 



Suppose, however，that we can represent as the union of a finite 
number, say k y of mutually disjoint sets A lt A 2y ..., so that 


y\ — wi(a ： i, x 2 ,. • • ， jc n )> 


y„ = u„(x u x 2 , ■ 


define a one-to-one transformation of each A ； onto 劣 . Thus, to each 
point in 0S there will correspond exactly one point in each of 

1 9 ^2 y • • • ， • Let 


= ⑷1/(少1，少2， . . •■'，少”)， 
= w 2j(y\t y2t . • • ， _v ")， 

x „〒 MU 少 1，少2, ... ，少 ”)， 


1 , 2 ,..., k. 


denote the k groups of n inverse functions, one group for each of these 
k transformations. Let the first partial derivatives be continuous and 
let each 


， / = 1 ，2， •. ，，灸， 


be not identically equal to zero in 涿 . From a consideration of the 
probability of the union of k mutually exclusive events and by applying 
the change of variable technique to the probability of each of these 
events, it can be seen that the joint p.d.f. of Y x = u^Xi, X 2 ,..., X n ), 

Yi = u 2 (X u X 2 ,..., X„),... y Y„ = u„{X u X 2 ,.. ： ,X n ), is given by 

• . ， •. •- 

k 

^( 少 | ，少 2， . •.» y ») = ^ Wh [ 撕 uCyi ， .. • ， 《 y”)，• • • ， ^nii.y i»• ■ • ，少 ”)】， 

/= i 

provided that (y\,y 2 ,... ,y„)e 潘 ， and equals zero elsewhere. The 
p.d.f. of any 7,， say JV is then 


容 iCh) 


g{y\,yi, … ，凡） 办 2 … dy„. 


An illustrative example follows. 


<dt …<|一仏 

• • V 

• • • 
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Example Z To illustrate the result just obtained, take n = 2 and let X \, X 2 
denote a random sample of size 2 from a standard normal distribution. The 
joint p.d.f. of X\ and X 2 is 


A^i.^) = 2^exp 


x? 4- 

T~ 


00 < ^| < 00, — 00 < X 2 < 00. 


Let Y\ denote the mean and let Y 1 denote twice the variance of the random 
sample. The associated transformation is 

X\ + x 2 


y\ 


yi 


2 ， 

(■^1 ~ 文 2) 2 
2 


This transformation maps = {(jc,,x 2 ): — oo <x t < oo, — oo <jc 2 < oo} onto 
薄 = {( 少 1 ， 少 2 ): — oo < ^ < 00, 0 < 少 2 < oo}. But the transformation is not 
one-to-one because, to each point in exclusive of points where 少 2 = 0, there 
correspond two points in In fact, the two groups of inverse functions are 


and 


x. 


文 I = ^ + 


& = 少 | + 


^2=^1 




Moreover, the set si cannot be represented as the union of two disjoint sets, 
each of which under our transformation maps onto Our difficulty is caused 
by those points of that lie on the line whose equation is x 2 = x,. At each 
of these points, we have y 2 = 0. However, we can define /^， x 2 ) to be zero 
at each point where jc, = x 2 . We can do this without altering the distribution 
of probability, because the probability measure of this set is zero. Thus 
we have a new sd = {(jc,, jc 2 ) : — c» < jc, < oo, — oo < < oo, but # jc 2 }. 

This space is the union of the two disjoint sets A y = {Cx, ， jc 2 ) : oc 2 > -^i} 
and A 2 = {(jc,, x 2 ) : x 2 < ^!}. Moreover, our transformation now defines 
a one-to-one transformation of each A h i = 1,2, onto the new 3S = 
{Cyi ，少 2 ): — < _Vi < oo, 0 < 少 2 < oo}. We can now find the joint p.d.f., say 

giy lt y 2 ), of the mean K, and twice the variance Y 2 of our random sample. 
An easy computation shows that |/|| = I/ 2 I = 1/^/^. Thus 

(■Vl - y/yi/2) 2 ( 少 1 + yjy^) 2 

2 


办 1 ，少 2) 


2 n 


exp 


2 


s/^yi 


+ 5 exp 


(■Vi + Oi - y/?z/2) 2 1 

- 2 





^2 /2_l e 


-yil2 


— oo<y, < co, 0 < y 2 < °°- 
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We can make three interesting observations. The mean Y { of our random 
sample is N(0, 5); Y 2 , which is twice the variance of our sample, js 欠 2 (1); and 
the two are independent. Thus the mean and the variance of our sample are 
independent. 


EXERCISES 

4.48. Let A",, X 2 , X 2 denote a random sample from a standard normal 
distribution. Let the random variables Y if Y 2 , K 3 be defined by 


= y, cos Y 2 sin r 3 , X 2 = r, sin Y 2 sin Y 3 , X 3 = Y x cos r 3 , 

where 0 < y, < 00 , 0 < K 2 < 2n, 0 < ^ re. Show that Y\, Y 2 , Y y are 

mutually independent. 

4.49. Let X u X 2 ,Xi be i.i.d., each with the distribution having p.d.f. 
Ax) = e~ x y 0 < < 00 , zero elsewhere. Show that 


y ^ y _ UA 

X, + X 2 ' l2 ~ + X 2 -h X 3 ' 


Y 3 = X t + X 2 + X, 


are mutually independent. 


4.50. Let Xi,X 2> ..., K be r independent gamma variables with pa¬ 
rameters a = a, and P = \, i = 1,2,..., r, respectively. Show that Y t = 
+ X 2 + ■ • • + X r has a gamma distribution with parameters a = 
a! + ■ ■ • + a r and P = l. 

Hint: LetY 2 = X 2 + -- + X rf Y i = X 3 +-+X„...,Y r = X r . 

•* s J 

451. Let , Y k have a Dirichlet distribution with parameters 

a ( ,. .., a*, ot t + 1. 

(a) Show that K, has a beta distribution with parameters a = and 

P = a 2 + ■ ■ ■ + an k+t . 

(b) Show that y, + ■ ■ • + Y„r ^ k, has a beta distribution with parameters 

a = a! + • • • + a, and ^ = a r+ , H - h a* + ( . 

(c) Show that y, + Y 2 , Kj + Y 4 , F 5 , • ■., Y kf k> 5, have , a Dirichlet 
distribution with parameters a, + a 2 , aj + a 4 , a 5 ,..., a t , a A + 

Hint: Recall the definition of Y t in Example 1 and use the fact that 
the^sum of several independent gamma variables with 存 =1 is a gamma 
variable (Exercise 4.50). 

4*52. Let Xy,X 2 , and Xj be three independent chi-square variables with r,, r 2 , 
and rj degrees of freedom, respectively. 

(a) Show that Y y = X y jX 2 and Y 2 = X y - {- X 2 are independent and that Y 2 
is zVi + 从 
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(b) Deduce that 




and 


X 也 

(A + X^jiry + r 2 ) 


are independent F-variables. 

4.53. If /(x) = I, — 1 < jc < 1, zero elsewhere, is the p.d.f. of the random 
variable X, find the p.d.f. of F = X 2 . 

4.54. If X u X 2 is a random sample from a standard normal distribution, 
find the joint p.d.f. of Yi = X] + X\ and Y 2 = X 2 and the marginal p.d.f. 
of F,. 

HiHt: Note that the space of Y { and Y 2 is given by —^J~y\ < ^2 < 

0 < >^1 < 00. 

4.55. If X has the p.^i.f.y(jc) = —1 < jc < 3, zero elsewhere, find the p.d.f. 

of Y= X 1 . : 

Hint: Here ^ = {y:0 ^ y <9} and the event y e 5 is the union of two 
mutually exclusive events if 5 = {^: 0 < ^ < 1}. 


4.6 Distributions of Order Statistics 

In this section the notion of an order statistic will be defined and 
we shall investigate some of the simpler properties of such a statistic. 
These statistics have in recent times come to play an important role 
in statistical inference partly because some of their properties do 
not depend upon the distribution from which the random sample is 
obtained. 

Let X t , X 2 ,... ,X„ denote a random sample from a distribution of 
the continuous type having a p.d.f. /(jc) that is positive, provided that 
a < x < b. Let Y x be the smallest of these X,-, Y 2 the next in order 
of magnitude, …， and Y„ the largest X,. That is, y, < y 2 < … < y” 
represent X 2t ... *, X„ when the latter are arranged in ascending 
order of magnitude. Then z* = 1, 2,..., n, is called the /th order 
statistic of the random sample X x , X 2 ,..., X„.lt will be shown that the 
joint p.d.f. of L ， y 2 , … ， y” is given by 

g(Ji ， 72, • • • ，凡 ）=• • -Ay n \ 

a<y\<y 2 <--' <y„<b, 

= 0 elsewhere. (1) 

We shall prove this only for the case n = 3, but the argument is seen 
to be entirely general. With n = 3, the joint p.d.f. of X 2 , Xj is 
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- Consider a probability such as Pr (a < X x = X 2 < b 9 
a < X 2 < b). This probability is given by 

•b fb 1 ^X 2 

AxMx 2 )Ax3 ) dx } dx 2 dXi = 0, 


y X2 


since 


A x i)^ x i 


is defined in calculus to be zero. As has been pointed out, we may, 
without altering the distribution of A",, X 2 , X 3 , define the joint 
p.d.f./(;C|)y(;c 2 )y(;c 3 ) to be zero at all points (jc,, x 2 , x^) that have 
at least two of their coordinates equal. Then the set si 、 where 
> 0, is the union of the six mutually disjoint sets: 

A, = {(x,, x 2 , x 3 ): a < x, < x 2 < Xj < bj, 

A 2 = {(^i,x 2 , x 3 ) 1 a < x 2 < Xi < x 3 < b], 

A 3 — {(x,, x 2 , x 3 ):a < x x < x 3 < x 2 < b}, 

^4 = {(xi,x 2 ,x 3 ) : a < x 2 < x 3 < x, < b}, 

A s = {(x,,x 2 , x 3 ) : a < a: 3 < a:, < x 2 < b), 

A 6 = {(x,, A ： 2, X 3 ) : fl < JCj < X 2 < A < △}. 

There are six of these sets because we can arrange x lt x 2 , x 3 in 
precisely 3! = 6 ways. Consider the functions y t = minimum of 
x if x 2 , x y ; y 2 = middle in magnitude of x t , x 2 , x y ; and y 2 = maximum 
of x y ,x 2 , Xj. These functions define one-to-one transformations 
that map each of A lt A 2 ,...,A 6 onto the same set 39 = {( 乃，少 2 , 少 3 ) : 

< 少 I 〈少 2 < 少 3 〈办 V The inverse functions are, for points in A lt 
x,=y tf x 2 = y 2 , x 3 = y 3 ; for points in A 2 , they are x, - y 2 , x 2 .= y iy 
x 3 = 少 3 ; and so on, for each of the remaining four sets. Then we have 
that 
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It is easily verified that the absolute value of each of the 3! = 6 
Jacobians is +1 • Thus the joint p.d.f. of the three order statistics 
Y\ = minimum of , X 2 , Y 2 = middle in magnitude of A",, X 2 , X 3 ; 
y 3 = maximum of A" 2 , X 3 is 

= \J i \Ay^)Ay2)f(y } ) + 1 ^ 1 ^)^,)^ 3 ) + … ’ 

+ \Jb\Ayi)Ay2)Ayil a <y x <y 2 <y 3 <b, 

=(3!)/ (少少 3)， < 少1 < 少2 < 少3 < 办， 

= 0 elsewhere. 

This is Equation (1) with n = 3. 

In accordance with the natural extension of Theorem 1, Section 2.4, 
to distributions of more than two random variables, it is seen that the 
order statistics, unlike the items of the random sample, are dependent. 

Example 1. Let X denote a random variable of the continuous type with 
a p.d.f. /\x) that is positive and continuous, provided that a <x <b and 
is zero elsewhere. The distribution function F\x) of X may be written 

/ = f ytw) dw, a < x < b. 


If x < a, F\x) — 0; and if b <x, F(x) = 1. Thus there is a unique median m 
of the distribution with F{m) = 5 . Let A",, X 2 , X } denote a random sample from 
this distribution and let y, < y 2 < y 3 denote the order statistics of the sample. 
We shall compute the probability that Y 2 <, m. The joint p.d.f. of the three 
order statistics is 

g ( yi ， y 2， y 3 ) = 6 Av _)/ (少 2 ) Av 3)， a 〈少 1 < 乃 < 乃 〈办， 

= 0 elsewhere. 


The p.d.f. of Y 2 is then 

^ 2 )= 矾少 2 ) 


•b 

yi 



Ay\)Ayi) dy { dy 3 . 


= 6 / 1 少 2 ) 八 少 2)[1 — ^ 2 )]， a <y 2 <b, 
= 0 elsewhere. 


Accordingly, 

Pr(y 2 ^m) = 6 


{Ryi)Ayi) - dy 2 


\F(y 2 )] 2 my 2 )]T 1 
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The procedure used in Example 1 can be used to obtain general 
formulas for the marginal probability density functions of the order 
statistics. We shall do this now. Let X denote a random variable of the 
continuous type having a p.d.f. 八文 ） that is positive and continuous, 
provided that a < x < b, and is zero elsewhere. Then the distribution 
function /Xjc) may be written 

= 0, x < a, 

f*x 

= f[w) dw, a <x <b. 



b <,x. 


Accordingly, F'(jc) = f{x), a < x < b. Moreover, if a < x < b, 
\ - Fix) = F{b)- F(x) 

，b fx 

=/(w) dw — f{W) dw 
=f A^) dw. 


Let X u X 2 ,... y X„ denote a random sample of size n from this 
distribution, and let Y 2 ,..... Y„ denote the order statistics of this 
random sample. Then the joint p.d.f. of Y^Y 2 ,... ,Y„ is 

gCvi ， 少 2 , . • • ，少 ” ） = ^Ay^Ayi) - - ./00 ， a<y t <y 2 〈 … <y„<b. 


= 0 elsewhere. 

It will first be shown how the marginal p.d.f. of Y„ may be expressed 
in terms of the distribution function /^jc) and the p.d.f. of the 
random variable X. H a < y„ < b, the marginal p.d.f. of y„ is given by 






•*yn 


广少4 


广少3 




03 


n\Ay\)Ayi) - - • ZOO 办 1 办 2 dy^- dy„ 




n\ 


Ay\) d y\ Ayi) -- TOO dy 2 ---dy n . 




rVn rys rev \13 

gn(y„) = … 卜 ! .. -Ay n ) H 

If the successive integrations on^ 4 ,... are carried out, it is seen 
that 

g n (y n ) = ^T)T/(^) 

= _ l Ay n ), a < y„ < b, 

— 0 elsewhere. 



- ^-i)] dy„ — Y ...dy 2 . 
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But 


/ 


[1一办_ 2 )] 2 

~ 2 ， 


so that 


?iCFi) : 


rb 


J y\ 


^ n\Ay y ) - - ■ Ay n . 2 ) [ ^—^f^dy n . 


dy 2 . 


Upon completing the integrations，it is found that 

gi ( ： F_) = «[1 - 办 )]" - !/( 乃 )，a <y } <b, 
— 0 elsewhere. 

Once it is observed that 

[mr 


[F(w)r~ ] Aw)dw 


CL 


a > 0 


and that 




[1 -F{w)Y- ] f{w)dw 


*> 


[1 - 赠 

^ J ^ ， 


办 >0, 


it is easy to express the marginal p.d.f. of any order statistic, say Y k , 
in terms of and f{x). This is done by evaluating the integral 


gM 












rb 


n\Ay\)Ay2) - - ./Cf") 办 " 


% ^y n 


dy k ^\dy x - -dy k _ } . 


The result is 


g k (y k )= 卜 附 )] * -'[1- 取 ) r k jiy k \ 

a <y k <b, 

= 0 elsewhere. (2) 

Example 2. Let Y] < Y 2 < Y 3 < Y A denote the order statistics of a random 
sample of size 4 from a distribution having p.d.f. 

f{x) = 2 x, 0 <x< 1, 

= 0 elsewhere. 
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We shall express the p.d.f. of y 3 in terms of/(jc) and f 1 (jr) and then compute 
Pr (5 < y 3 ). Here F{x) = jc 2 , provided that 0 < jc < 1, so that 

= 2TJT - ^K 2 ^). 0 < _v 3 < 1 ， 

= 0 elsewhere. 

Thus 

' /*ao 

Pr(i< n)= ^(>-3) dy, 

^ 1/2 



8ij(yi，yj) = 1 )! (n-j)\ 

x mydy-^Fiyj)- /— ， -i[l 一 FiyjW-^Wj) 0) 
for a <y t < y } < b, and zero elsewhere. 

Remark. There is an easy method of remembering a p.d.f. like that given 
in Formula (3). The probability Pr (y, < Y,- < y,- + A„ < Yj < y ; + A y ), 
where A, and A y are small, can be approximated by the following multinomial 
probability. In n independent trials, i — 1 outcomes must be less than y, 
(an event that has probability p t = F{y) on each trial); j — i — l outcomes 
must be between y,- + A, and y ； [an event with approximate probability 
p 2 = HyJ) — F ( 乃 ） on each trial]; n—j outcomes must be greater than yj + 
(an event with approximate probability = 1 — Fly/) on each trial); one 
outcome must be between y,- and y, + A, (an event with approximate 
probability p 4 = f{yi) A ; on each trial); and finally one outcome must be 
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between y i and yj + [an event with approximate probability p 5 
on each trial]. This multinomial probability is 

/i! 

(/- l)\(j-i - l)!(n -»! 1 ! 1 ! P， '~ W 
which is g u (yi, 乃 ) 

Certain functions of the order statistics Y u Y 2 ,..., Y„ are 
important statistics themselves. A few of these are: (a) Y„ — Y x , which 
is called the range of the random sample; (b) (K| + Y„)/2, which is 
called the midrange of the random sample; and (c) if n odd, Y (n + m , 
which is called the median of the random sample. 

Example 3. Let Y { , Y 2 , be the order statistics of a random sample of 
size 3 from a distribution having p.d.f. 

J{x) = 1 ， 0 < jc < 1 , 

= 0 elsewhere. 

We seek the p.d.f. of the sample range Z x — Y 3 — Y x . Since / ^jc) = x, 
0 < x < 1 , the joint p.d.f. of and F 3 is 

?I 3 (>W 3 ) = 6(^3 - 0 < >», < ^3 < 1, 

= 0 elsewhere. 

In addition to Z\ = — Y [f let Z 2 = Y 3 . Consider the functions z t =^ 3 — ^i, 

z 2 = y 3 , and their inverses yy = z 2 — z } , y 3 = z 2 , so that the corresponding 
Jacobian of the one-to-one transformation is 

办 1 办 1 
dz t dz 2 
~ dyi dy^ 
dzi dz 2 

Thus the joint p.d.f. of Z, and Z 2 is 

^,,^) = 1-1167,= : 6z t , 0 < z, < z 2 < 1. 

= 0 elsewhere. 

Accordingly, the p.d.f. of the range Z, = F 3 — Y x of the random sample of 
size 3 is 

疒 1 

h x (z{) = 6z, dz 2 = 62,(1 — Z\), 0 < < 1, 

= 0 elsewhere. 


1. 
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EXERCISES 

4.56. Let K, < < K 3 < be the order statistics of a random sample of size 

4 from the distribution having p.d.f.y(x) = e~ x ,0 < x < oo, zero elsewhere. 
Find Pr (3 < r 4 ). 

4.57. Let X { ,X 2 , ^bea random sample from a distribution of the continuous 
type having p.d.f. J{x) = 2x, 0 < x < 1, zero elsewhere. 

(a) Compute the probability that the smallest of these Xj exceeds the 
median of the distribution. 

(b) If Y y <Y 2 < Y 3 are the order statistics, find the correlation between Y 2 
and Y 3 . 

4.58. Let j{x) = 去， x = 1, 2, 3,4, 5, 6, zero elsewhere, be the p.d.f. of a 
distribution of the discrete type. Show that the p.d.f. of the smallest 
observation of a random sample of size 5 from this distribution is 

flCyi ) = (^J-(^) 5 , ^ = 1,2,...,6, 

zero elsewhere. Note that in this exercise the random sample is from a 
distribution of the discrete type. All formulas in the text were derived under 
the assumption that the random sample is^from a distribution of the 
continuous type and are not applicable. Why? 

4.59. Let Y { < Y 2 < Y 3 < Y A < Y 5 denote the order statistics of a random 
sample of size 5 from a distribution having p.d.f. J{x) = e~ x y 0 < x < oo, 
zero elsewhere. Show that Z, = Y 2 and Z 2 = Y 4 — Y 2 are independent. 

Hint: First find the joint p.d.f. of Y 2 and Y 4 . 

4.60. Let y, < < • • • < be the order statistics of a random sample of 

size n from a distribution with p.d.f. / {x) = 1， 0 < x < 1， zero elsewhere. 
Show that the kth order statistic Y k has a beta p.d.f. with parameters a = k 
and ^ = n — k + l. 

4.61. Let Y t < Y 2 < • • • < Y„ be the order statistics from a Weibull 
distribution, Exercise 3.44, Section 3.3. Find the distribution function and 

p.d.f. of 

4.62. Find the probability that the range of a random sample of size 4 
from the uniform distribution having the p.d.f. J{x) = 1， 0 < x < 1， zero 
elsewhere, is less than 

4.63. Let r, < r 2 < Y 3 be the order statistics of a random sample of size 3 
from a distribution having the p.d.f. J[x) = 2x, 0 < x < 1, zero elsewhere. 
Show that Z, = YJY 2y Z 2 = Y 2 IY 3 , and = K 3 are mutually independent. 
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4.64. If a random sample of size 2 is taken from a distribution having p.d.f. 
= 2(1 — X )， 0 < x < 1, zero elsewhere, compute the probability that 

one sample observation is at least twice as large as the other. 

4.65. Let y, < Y 2 < denote the order statistics of a random sample of size 
3 from a distribution with p.d.f. J{x) = 1, 0 < x < 1, zero elsewhere. Let 
Z = (Y, + Yi)/2 be the midrange of the sample. Find the p.d.f. of Z. 

4.66. Let Y\ < Y 2 denote the order statistics of a random sample of size 2 
from ^(0, a 2 ). 

(a) Show that m) = — 

Hint: Evaluate E{Y x ) by using the joint p.d.f. of Y x and Y 2 , and 
first integrating on y { . 

(b) Find the covariance of and Y 2 . 

4.67. Let Y\ < Y 2 be the order statistics of a random sample of size 2 
from a distribution of the continuous type which has p.d.f. J[x) such that 
y(x) > 0, provided that x ^ 0, and /(x) = 0 elsewhere. Show that the 
independence of Z, = y, and Z 2 = Y 2 — Y, characterizes the gamma p.d.f. 
y(x), which has parameters a = 1 and P >0. 

Hint: Use the change-of-variable technique to find the joint p.d.f. of 
Z, and Z 2 from that of Y { and Y 2 . Accept the fact that the functional 
equation h(0)h(x + _y) = h(x)h(y) has the solution h(x) = c i e ClX , where c, 
and c 2 are constants. 

4.68. Let Y, < Y 2 < Y 3 < Y A be the order statistics of a random sample of size 
n = 4 from a distribution with p.d.f. J{x) = 2x, 0 < x < 1. 

(a) Find the joint p.d.f. of Y 3 and Y 4 . 

(b) Find the conditional p.d.f. of Kj, given Y 4 = y 4 . 

(c) Evaluate £(J" 3 | 少 4 ). 

4.69. Two numbers are selected at random from the interval (0, 1). If these 
values are uniformly and independently distributed, compute the prob¬ 
ability that the three resulting line segments, by cutting the interval at the 
numbers, can form a triangle. 

4.70. Let X and Y denote independent random variables with respec¬ 
tive probability density functions J{x) = 2x, 0 < x < 1, zero elsewhere, 
and g(y) = 3j 2 , 0 < 少 < 1, zero elsewhere. Let U = min (A", Y) and V= 
max (X, y). Find the joint p.d.f. of U and V. 

Hint: Here the two inverse transformations are given by x = u, y = v 
and x = v, y = u. 

4.71. Let the joint p.d.f. of X and Y be f{x, >») = '-fx(x + 少 )， 0 < x < 1, 
0 < ^ < 1, zero elsewhere. Let C/ = min (兄 y) and V = max (A^, y). Find 
the joint p.d.f. of U and V. 



Sec. 4.7] The Moment-Generating-Function Technique 


203 


4.72. Let X 2 ,..., X n \)t & random sample from a distribution of either 
type. A measure of spread is Ginfs mean difference 

G '辦， /©. 

10 

(a) If n = 10, find a { , a 2 ,..., a, 0 so that G = ^ a,T„ where 

i= I 

Y ]f Y 2 ,..., Y l0 are the order statistics of the sample. 

(b) Show that E{G) = lajy/n if the sample arises from the normal 
distribution N(fi, <r 2 ). 

4.73. Let X, < K 2 < • • • < K,, be the order statistics of a random sample of 

size n from the exponential distribution with = e— x ，0 < x < co, 

zero elsewhere. 

(a) Show that Z,=«r„Z 2 = («- 1)(K 2 - K,), Z 3 = (n - 2) (Y 3 -Y 2 ), 

...,Z n = Y„— , are independent and that each Z ； has the 

exponential distribution. 

ft 

(b) Demonstrate that all linear functions of Y u Y 2 , ■.., Y„, such as ^ a, K„ 

I 

can be expressed as linear functions of independent random variables. 

4.74. In the Program Evaluation and Review Technique (PERT), we are 

interested in the total time to complete a project that is comprised of 
a large number of subprojects. For illustration, let AT,, X 2 , X 3 be three 
independent random times for three subprojects. If these subprojects are 
in series (the first one must be completed before the second starts, etc.), 
then we are interested in the sum Y = + X 2 + X } . If these are in 

parallel (can be worked on simultaneously), then we are interested in 
Z = max X 2 , A" 3 ). In the case each of these random variables has the 
uniform distribution with p.d.f. J{x) = 1, 0 < x < 1, zero elsewhere, find 
(a) the p.d.f. of Y and (b) the p.d.f. of Z. 

4.7 The Moment-Generating-Fiinction Technique 

The change-of-variable procedure has been seen, in certain cases, 
to be an effective method of finding the distribution of a function of 
several random variables. An alternative procedure, built around the 
concept of the m.g.f. of a distribution, will be presented in this section. 
This procedure is particularly effective in certain instances. We should 
recall that an m.g.f., when it exists, is unique and that it uniquely 
determines the distribution of probability. 

Let h(x { ,x 2f ..., x n ) denote the joint p.d.f. of the n random 
variables X U X 2 ,..., X„. These random variables may or may not be 


the observations of a random sample from some distribution that has 
a given p.d.f. J{x). Let Y t = X 2 ,..., X„). We seek g ( 少 ,)，the 

p.d.f. of the random variable F,. Consider the m.g.f. of Y,. If it exists, 
it is given by 


M(t) = E(e tYi )= 



e tyi giy\)dy t 


in the continuous case. It would seem that We need to know before 

we can compute M{t). That this is not the case is a fundamental fact. 
To see this consider 



*00 

exp [tu^Xt, • • _, x n )]h(x u ...,x n )dxr- dx„, 

** — nrj 


⑴ 


which we assume to exist for —h < t <h. We shall introduce n 
new variables of integration. They are = w,(jc,, jc 2 , ..., x„), …， 
y„ = u„(x i ,x 2 ,..., a:„). Momentarily, we assume that these func¬ 
tions define a one- 安 -one transformation. Let x-, = w,{y x , 少 2 , • • • ， y n \ 
i = 1 ， 2, ... ， n ， denote the inverse functions and let J denote the 
Jacobian. Under this transformation, display (1) becomes 

e ty ^\J\h{w x ,.. .,w„)dy 2 ' • ■ dy„ dy x . (2) 



In accordance with Section 4.5, 


. •, ，少 „)， ..., , y 2 . y„)] 

is the joint p.d.f. of F lt y 2 ,..., Y„. The marginal p.d.f. g(^,) of y, 
is obtained by integrating this joint p.d.f. on y 2 , ■■- ,>v Since the 
factor e tyi does not involve the variables y 2 , y„, display (2) may 
be written as 

(*.Q0 

(少 1 ) 办 1 . ⑶ 

•^ — oo 

But this is by definition the m.g.f. M(t) of the distribution of F,. 
That is, we can compute £{exp [tU\{X u ..., D]} and have the value 

of E(e tY '\ where Y t = u^Xy . X„). This fact provides another 

technique to help us find the p.d.f. of a function of several random 
variables. For if the m.g.f. of Y x is seen to be that of a certain kind of 
distribution, the uniqueness property makes it certain that Y x has that 
kind of distribution. When the p.d.f. of F, is obtained in this manner, 
we say that we use the moment-generating-function technique. 
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The reader will observe that we have assumed the transformation 
to be one-to-one. We did this for simplicity of presentation. If the 
transformation is not one-to-one, let 

x j = ^ 7(^1 ’ ， y n \ j = 1» 2 ,..., w, i = 1,2,..., /c, 

denote the fe groups of n inverse functions each. Let J h i =乂 2,… ， k ， 
denote the k Jacobians. Then 

k 

^ ， • • • ，少”)， • • • ，… • • • ， 3^ i )] (4) 

/=i 

is the joint p.d.f. of K,,. .., Y n . Then display (1) becomes display (2) 
with |/|A(w, ， … ， w n ) replaced by display (4). Hence our result is valid 
if the transformation is not one-to-one. It seems evident that we can 
treat the discrete case in an analogous manner with the same result. 

It should be noted that the expectation of Y\ can be computed in 
like manner. That is, 

产 00 

£(^i) = 办 1 



« i ( jc ,, ...,,... ,x„)dx x - ' dx n , 


and this fact has been mentioned earlier in the book. Moreover, this 
holds for the expectation of any function of Y u say w(K|); that is, 

广 00 

4>(k)]= 办 i 



wWtiXu ... ， x^hix^ .. .,x n )dxr- dx„. 


We shall now give some examples and prove some theorems where 
we use the moment-generating-function technique. In the first example, 
to emphasize the nature of the problem, we find the distribution of a 
rather simple statistic both by a direct probabilistic argument and by 
the moment-generating-function technique. 

Example 1. Let the independent random variables X\ and X 2 have the 
same p.d.f. 


f(x) = I, x= \,2, 3, 
elsewhere; 


0 
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so the joint p.d.f. of and X 2 is 

1/(^) X, = 1, 2, 3, x 2 = 1,2, 3, 

= 0 elsewhere. 

A probability, such as Pr (A", = 2,X 2 = 3), can be seen immediately to be 
(2)(3)/36 = However, consider a probability such as Pr (A", + X 2 = 3). The 
computation can be made by first observing that the event A", + = 3 is the 

union, exclusive of the events with probability zero, of the two mutually 
exclusive events = l, X 2 = 2) and (X, =2, X 2 = 1). Thus 

Pr (JT, + X 2 = 3) = Pr (X t = \, X 2 = 2) + Pr (X, = 2, X 2 = 1) 

( 1 )( 2 ) . ( 2 )( 1 ) 4 

~ 36 + 36 " 36' 

More generally, let y represent any of the numbers 2,3,4,5,6. The probability 
of each of the events X x + X 2 = y.,y = 2, 3, 4, 5, 6, can be computed as in the 
case ^ = 3. Let g(_v) = Pr (X t + X 2 = Then the table 


y 

2 

3 

4 

5 

6 


1 

56 

4 

36 

10 

36 

12 

35 

9 

36 


gives the values of g(_y) for 少 = 2,3,4, 5,6. For all other values of y, g(y) = 0. 
What we have actually done is to define a new random variable Y by 
Y = X } + X 2 , and we have found the p.d.f. of this random variable Y. 
We shall now solve the same problem, and by the moment-generating-func- 
tion technique. 

Now the m.g.f. of Y is 

M(t) = E(e^ x ' + x ^) 

* 

= Eie^'e^ 2 ) 

since X, and X 2 are independent. In this example X\ and X 2 have the same 
distribution, so they have the same m.g.f.; that is, 

E(e' x, ) = E(e tXt ) = + + |e 3r . 

Thus 

= ize' + \e 21 + P) 2 
=^ + ^ + ^ + ^ + 


This form of M(t) tells us immediately that the p.d.f. g(y) of Y is zero except 
at ^ = 2, 3,4, 5, 6, and that g ⑺ assumes the values 去，去，茲， ]| ，圣， 
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respectively, at these points where 贫 ( 少 ） > 0. This is, of course, the same 
result that was obtained in the first solution. There appears here to be little, 
if any, preference for one solution over the other. But in more complicated 
situations, and particularly with random variables of the continuous type, the 
moment-generating-function technique can prove very powerful. 

Example 2. Let X, and X 2 be independent with normal distributions 
a]) and jV(" 2 , 4) ， respectively. Define the random variable Y by 
Y= Xi — X 2 . The problem is to find 贫 ( 少 )， the p.d.f. of Y. This will be done 
by first finding the m.g.f. of K. It is 

M(t) = E(e ,{X '~ Xi) ) 

=E(e' x ')E(e- ,x ^), 


since and X 2 are independent. It is known that 

E(e ,Xl ) = exp 

and that 

E(e' Xl ) = exp 

for all real t. Then E(e~ ,Xl )canbe obtained from E{e ,Xi ) by replacing /by — r. 
That is, 

E{e~ ,Xl ) = exp (—n 2 t + 

Finally, then, 

M(t) = exp 


=exp 

The distribution of Y is completely determined by its m.g.f. M(t), and it is seen 
that Y has the p.d.f. 贫 ( 少 )， which is N(n, — ii 2 ,(r\ + a\). That is, the difference 
between two independent, normally distributed，random variables is itself a 
random variable which is normally distributed with mean equal to the 
difference of the means (in the order indicated) and the variance equal to the 
sum of the variances. 




( 川 - ^l)t + 


+ a\)t 2 





The following theorem, which is a generalization of Example 2, is 
very important in distribution theory. 
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Theorem 1. Let H … be independent random variables 
having, respectively, the normal distributions (t]), <r ^),... ， 

and al). The random variable Y = k t X] + k 2 X 2 + . . ■ + k„X n ， 
where k t , k 2 , •. • ， k n are real constants, is normally distributed with 
mean k x pL x + ■ ■ • 4 - k„n„ and variance k]a\ + ■ • . + That is, Y is 

Mi i 咖 ?) . 


Proof. Because X u X 2 ,..., X„ are independent, the m.g.f. of Y is 
given by 

M(t) = £{exp [t{k x X, + 仏 + … + k n X n )]} 

= E(e lk ' Xi )E(e tk2X2 ) - - - E(e tknXn ). 

Now 

E{e tXi ) = exp + 宇)， 
for all real /, i = 1, 2, Hence we have 


E{e tk,Xi ) = exp 


A(M) + 


2 


That is, the m.g.f. of Y is 

n 

M{t) = [~[ exp 


(/:,〜)/ + 


exp 


Z k ^i ) / 


2 


my 


2 


But this is the m.g.f. of a distribution that is ^V| J Z 


This is tlie desired result. 


The next theorem is a generalization of Theorem 1. 

Theorem 2. If X u X 2 ,..., X„ are independent random variables 
with respective moment-generating functions A/,(0, / = 1,2, 3, ..., 
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then the moment-generating function of 

/= i 

where a [y a 2 ^.^a k are real constants，is 

A/〆/) = fj M^t). 

i * I 

Proof. The m.g.f. of Y is given by 

M Y {t) = E[e tY ) = E[e >(a ' x ' +a2Xl+ +anXn) ] 

—— 

= E[e Pi ， Xi ]E[e a2，X2 ] - - - E[e antX "] 
because U 2 , … ，尤 are independent. However, since 

E( tXi ) = M,(0, 

then 

E(e itXi ) = M^t). 

Thus we have that 

My(t) = - - - M n M 

=n 从 , ( 洱 ,)• 


A corollary follows immediately, and it will be used in some 
important examples. 

Corollary. If X { ,X 2 ,... ,X„ are observations of a random sample 
from a distribution with moment-generating function M{t) y then 

If 

(a) The moment-generating function of Y = X { is 

；=I 

My{t) = fl M(t) = [A/(/)r ； 

/= r 

一 n 

(b) The moment-generating function of X = ^ (\/n)Xi is 

i = i 

_ )= MCKKCT. 

Proof. For (a), let a t = 1, / = 1, 2. n, in Theorem 2. For (b), 

take a, = 1 /n, i = 1, 2,..., 



210 


DistrUMiota of Ftmctions of Random Variables [Ch. 4 


The following examples and the exercises give some important appli¬ 
cations of Theorem 2 and its corollary. 

Example 3. Let X 2 ,..., X„ denote the outcomes on n Bernoulli trials. 
The m.g.f. of X h i = 1, 2,..., n, is 


M(t) = \ — p + pe'. 


If y= J X h then 


M Y (t) = fl (* -P+pe 1 ) = (l -p + pe f )\ 


Thus we again see that Y is b(n,p). 


n 


Example 4. Let X u X 2t X 3 be the observations of a random sample of size 
= 3 from the exponential distribution having mean P and, of course, m.g.f. 


M(t) = 1/(1 - pt), t < l/p. The m.g.f. of Y=X { +X 2 + X 3 is 

My(0 = 1(1 - W] 3 = (1 一 Pt)~\ t < l/P ， 

which is that of a gamma distribution with parameters a = 3 and p. Thus Y 
has this distribution. On the other hand, the m.g.f. of X is 


= [ l ~jj ' 3 = ^-y) 3 , t<VP; 


and hence the distribution of X is gamma with parameters a = 3 and p/3, 
respectively. 


The next example is so important that we state it as a theorem. 

Theorem 3. Let X y , X 2 ,... ,X„be independent variables that have, 
respectively, the chi-square distributions x \ r \)> J( 2 ( r 2 )，• • • > and x 2 ( r »)- 
Then the random variable Y — X 2 -\- • • - X n has a chi-square 

distribution with ri + • • • + r„ degrees of freedom., that is, Y is 

l\r\ + ••• + /*„)_ 

Proof. Since 

M,(/) = E(^) = (1 - t<\, / =1,2,...,/!, 

we have, using Theorem 2 with a t = • - ■ = a n = 1, 

M(/) = (1 — 2 /)- (r _ +r2 — +r ” , /2 ， t<\. 

But this is the m.g.f. of a distribution that is x 2 ( 广 i + r 2 + • • ■ 4 - r H ). 
Accordingly, Y has this chi-square distribution. 

Next, let H …，尤 be a random sample of size n from 
a distribution that is a 1 ). In accordance with Theorem 2 of 



n 




/•ao 


exp ($ + tj ^\Ax i ,y i )dx i dy, 
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Section 3.4, each of the random variables (X, — n) 2 /a 2 , i = 1, 2,..., n, 
is x 2 (0- Moreover, these n random variables are independent. 

n 

Accordingly, by Theorem 3, the random variable K = ^ [(A", — n)/c] 7 
is ^(n). This proves the following theorem. 

Theorem 4. Let X { , X 2t ..., X n denote a random sample of size n 
from a distribution that is a 2 ). The random variable 



has a chi-square distribution with n degrees of freedom. 

Not always do we sample from a distribution of one random 
variable. Let the random variables X and Y have the joint p.d.f. 
f{x t y) and let the 2n random variables (U )， (X 2t Y 2 ), … ， (X„ t Y n ) 
have the joint p.d.f. 

yi)Ax 2 ,y 2 ) ■- 少 "). 

The n random pairs (U), (X 2 , Y 2 ), • • • ， (X„, Y„) are then inde¬ 
pendent and are said to constitute a random sample of size n from the 
distribution of X and Y. In the next paragraph we shall take/(^, y) to 
be the normal bivariate p.d.f., and we shall solve a problem in sampling 
theory when we are sampling from this two-variable distribution. 

Let « ， F,), (^ 2 , Y 2 ) .(A^, Y n ) denote a random sample of 

size n from a bivariate normal distribution with p.d.f. J{x, y) and 

parameters /4 卜 n 2 , <r 2 t , aj, and p. We wish to find the joint p.d.f. of the 
一 " 一 11 — 
two statistics A" = ^ XJn and K = K,/n. We call X the mean of 

_ I I 

, X„ and Y the mean of Y\,, Y„. Since the joint p.d.f. of 
the 2n random variables (Xi, K,), i = 1, 2,..., n, is given by 

h =A x \^y\)Axi,yi) - - Ax H ,y n ), 
the m.g.f. of the two means X and Y is given by 






n 


X, 

: E- 


xp 


A/( 



212 


DistributUms of Fmctioas of Random VarUMes [Ch. 4 


The justification of the form of the right-hand member of the second equal¬ 
ity is that each pair (X„ y；) has the same p.d.f. and that these n pairs are 
independent. The twofold integral in the brackets in the last equality is the 
joint m.g.f. of X\ and Y, (see Section 3.5) with ti replaced by t\/n and t 2 
replaced by t 2 /n. Accordingly, 


M{t v f 2 ) 


n 

n 

r= I 


exp 




n 


n 


a^jtjnf + 2p<T x a 2 {tJh^tjri) + a\ {t 2 /nf 
—————' ’ 2 ^ 


exp 


hMi + hfh 


+ 2p(cr,<r 2 /n)r,/ 2 + (of/ n)t\ 


2 


But this is the m.g.f. of a bivariate normal distribution with means 
Hx and 只 2 ，variances a]/n and and correlation coefficient p; 
therefore, X and Y have this joint distribution. 


EXERCISES 

4.75. Let the i.i.d. random variables and X 2 have the same p.d.f./(x) = j, 
x = 1 ， 2, 3,4, 5, 6 , zero elsewhere. Find the p.d.f. of Y = X ，+ X 2 . Note, 
under appropriate assumptions, that Y may be interpreted as the sum of 
the spots that appear when two dice are cast. 

4.76. Let X\ and X 2 be independent with normal distributions A^( 6 , 1) and 
Nil, 1), respectively. Find Pr (X } > X 2 ). 

Hint: Write Pr (A", > X 2 ) = Pr (X, — X 2 >0) and determine the 
distribution of X x — X 2 . 

4.77. Let and X 2 be independent random variables. Let X, and 
V = X { + X 2 have chi-square distributions with r ( and r degrees of freedom, 
respectively. Here r, < r. Show that X 2 has a chi-square distribution with 
r — r t degrees of freedom. 

Hint: Write M(t) = E(e fiXl + X2) ) and make use of the independence of X, 
and X 2 . 

4.78. Let the independent random variables X y and X 2 have binomial 
distributions with parameters n,, /», = 5 and n 2 , p 2 = respectively. Show 
that Y = X] — X 2 + n 2 has a binomial distribution with parameters 
n = n f + n 2 , p 

4.79. Let ^ 1 ,^ 2 , be a random sample of size n = 3 from iV(l, 4). Compute 
P(X t + 1X 7 - 2X-, > 7). 
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4.80. Let X\ and X 2 be two independent random variables. Let X\ and 
Y = X y -\- X 2 have Poisson distributions with means and // > fi ly 
respectively. Find the distribution of X 2 . 

4.81. Let X], X 2 be two independent gamma random variables with 
parameters ot, = 3, 反 = 3 and ot 2 = 5, 办 2 = 1 ， respectively. 

(a) Find the m.g.f. of K = 2X t + 6X 2 . 

(b) What is the distribution of 

4.82. A certain job is completed in three steps in series. The means and 
standard deviations for the steps are (in minutes): 


Step 

Mean 

Standard Deviation 

1 

17 

2 

2 

13 

1 

3 

13 

2 


Assuming independent steps and normal distributions, compute the 
probability that the job will take less than 40 minutes to complete. 

4.83. Let X be A^O, 1). Use the moment-generating-function technique to 
show that Y = X 2 is x 2 (l)- 

Hint: Evaluate the integral that represents E{<^ x2 ) by writing 

w = x^/1 — 2t, t < j. 

4.84. Let Xy, X 2 ,... ,X„ denote n mutually independent random variables 
with the moment-generating functions M^t), M 2 (t), … ， respect¬ 
ively. 

(a) Show that Y = k t X { + k 2 X 2 + .. • + k„X„,wheKk' ， k 2 ,..,, k„ are real 

n 

constants, has the m.g.f. M(t) = fj 

I 

(b) If each k, = 1 and if Xj is Poisson with mean fi h i = 1 ， 2,… ， n, prove 
that Y is Poisson with mean //, + •. • + /v 

4.85. If A"|, A" 2 ,..., X n is a random sample from a distribution with m.g.f. 

H ff 

M{t), show that the moment-generating functions of Xj and are, 

respectively, [M(t)] n and [M(t/n)] n . 

4.86. In Exercise 4.74 concerning PERT, assume that each of the three 
independent variables has the p.d.f./(x) = e~ x , 0 < x < oo, zero elsewhere. 
Find: 

(a) The p.d.f. of Y. 

(b) The p.d.f. of Z_ 

4.87. If X and Y have a bivariate normal distribution with parameters 

/x,, /i 2 , a\, and p, show that Z = aX + ftK + c is 

N(afii + bfi 2 + c, c^d\ 4 - labpaiO^ + b 2 a\), 
where a, b, and c are constants. 

Hint: Use the m.g.f. M(t\, t 2 ) of X and Y to find the m.g.f. of Z. 
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4.88. Let X and Y have a bivariate normal distribution with parameters 

= 25, = 35, <rf = 4, a\ = 16, and p — %• If Z = 3X — 2Y, find 

Pr(-2 <Z< 19). 

4.89. Let U and K be independent random variables, each having a standard 
normal distribution. Show that the m.g.f. Eie 1 ^) of the product UV is 

(1 一 f 2 )-” 2 , — 1 < / < 1 _ 

Hint: Compare E(e' uy ) with the integral of a bivariate normal p.d.f. that 
has means equal to zero. 

4.90. Let X and Y have a bivariate normal distribution with the parameters 
fi x , fi 2 , <r\, and p. Show that 

W = X and Z = (Y- fi 2 ) - p(a 2 /a t )(X - //,) 

are independent normal variables. 

4.91. Let X,, X 2i Xybe a. random sample of size « = 3 from the standard 
normal distribution. 

(a) Show that Y t = A", + SX 3 , Y 2 = X 2 + 厶 has a bivariate normal 
distribution. 


(b) Find the value of S so that the correlation coefficient p = \. 

(c) What additional transformation involving K, and Y 2 would produce a 
bivariate normal distribution with means 川 and 只 2 , variances a] and 
al ，and the same correlation coefficient p? 

4.92. Let X t , X 2 ,... t X n be a random sample of size n from the normal 

fl 

distribution a 2 ). Find the joint distribution of Y = a ^i and 

1 

n 

Z = Y^ biX h where the a, and bj are real constants. When, and only when, 
1 


are Y and Z independent? 

Hint: Note that the joint m.g.f. £[exp (,, f + / 2 X 办义 ) ]is that 


of a bivariate normal distribution. 


4.93. Let A",, ^bea random sample of size 2 from a distribution with positive 
variance and m.g.f. M{t). \f Y = + X 2 and Z = X x — X 2 are independent, 

prove that the distribution from which the sample is taken is a normal 
distribution. 

Hint: Show that 

t 2 ) = 枳 exp [/.(JT, + X 2 ) + h(X t - X 2 )]} = - t 2 ). 

Express each member of m(t,, t 2 ) = m(/, ， 0)m(0, / 2 ) in terms of M; differ¬ 
entiate twice with respect to t 2 ; set t 2 = 0; and solve the resulting differential 
equation in M. 


4.8 The Distributions of X and nS 2 /a 2 

Let X u X 2 ,..., X„ denote a random sample of size n >2 from a 
distribution that is N(fi, a 2 ). In this section we shall investigate the 
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distributions of the mean and the variance of this random sample, 

— n 

that is, the distributions of the two statistics X = Y, X“n and 
5 2 = i d Xfln. ' 

I _ 

The problem of the distribution of X, the mean of the sample, is 
solved by the use of Theorem 1 of S^tion 4.7. We have here, in the 
notation of the statement of that theorem, = n 2 = ' " = ^ 

(rj = ffj = ■- = (^„ = or 2 , and ki = k 2 = … =k„ = l/n. Accordingly, 
Y = X has a normal distribution with mean and variance given by 



respectively. That is, X is N(ji, (r 2 /rt). 

Example I. Let X be the mean of a random sample of size 25 from a 
distribution that is N(15, 100). Thus X is ^(75, 4). Then, for instance, 


Pr (71 < JP<79) = <t> 




=<l»(2) - = 0.954. 

We now take up the problem of the distribution of S 2 , the variance 
of a random sample X 、、 …、 X„ from a distribution that is N(ji, <r 2 ). 
To do this。let us first consider the joint distribution of = X, 
Y 2 = X 2 — X, Yi = X 3 — X,..., Y„ = X„ — X. The corresponding 
inverse transformation 


弋 = 少| 一少 2 —少 3 - y n 

x 2 = + y 2 

太 3 = 少1 + 少 3 

x„ = y t + y„ 

has Jacobian n. Since 

一 P ) 2 = z ( x i - X + X- n ) 2 
I I 


= j] (Xi - x ) 2 + n(x - n ) 2 

I 
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because 2(3c — [ Oc; — = 0， the joint p.d.f. of X, ,X 2 ,..., X„ 

I " 

can be written 


( 丄、 

K^/ht a/ 


exp 


Z(^/-3c) 2 n(x - n) 2 
"" 2? ~ 2a 2 


where 3c represents (x, + jc 2 + ■ • • + x„)fn and —oo < x ； < oo, 
1,2,.,.,Accordingly, with y t = x and x t — x = —y 2 — 少 3 — 
—y„, we find that the joint p.d.f. of F,, F 2 ,..., Y n is 


⑻ 


^y/2n 


exp 


<T> 


(_ 少 2 — ... - 少 ”)2 
la 1 


_2 _ 


n(y\ - M ) 2 

2a 2 


— 00 < < 00 , i = 1,2,...,Note that this is the product of the 

p.d.f. of Yi, namely, 

(>M ) 2 




exp 


2a 2 In 


ao < y t < co, 


and a function of y 2 , ■ • ■ ,y n - Thus V, must be independent of 
the « — 1 random variables Y 2 , Y 3y ..., Y„ and that function of 
^ 2 , • • - ,^ n is the joint p.d.f. of Y 2 , Y^,..., Y„. Moreover, this means 
that Y, = X and thus 


«(F, -fi) 2 n(X - n) 2 


o 2 


a 2 


fy , 


are independent of 


(~r 2 - n ) 2 + EC I - X ) 2 . 

--— — 1 — = W 2 . 






Since W\ is the square of a standard normal variable, it is distributed 
as x 2 (l)* Also, we know that 


w = i ( x n 、 


a 


fV,-hfV 2 


is x\ n ) - From the independence of W x and fV 2 , we have 

E(e ,H ) = E(e ,ty ')E(e ,W2 ) 

or, equivalently, 

(1 - 2t)-” 12 = (1 - 2t)~ m E{e ,Wi \ t<\. 
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Thus 

E{e ,Wi ) = {\ -20~ ( "-' )/2 , t<[, 

and hence W 1 = nS 2 /^ is — 1). The determination of the p.d.f. of 
S 2 is an easy exercise from this result (see Exercise 4.99). 

To summarize, we have established, in this section, three important 
properties of A" and S 2 when the sample arises from a distribution which 
is N(n ， a 2 ): 

1 . X hNinytn). 

2. nS 2 /<r 2 is x 2 (« — 1). 

3. X and S 2 are independent. 

For illustration, as the result of properties (1), (2), and (3), we have 
that y/n(X — n)/a is 1). Thus, from the definition of Student's r, 

^ ix- X u 

~ JnS 2 !a\n - 1 ) 

has a ^-distribution with n — 1 degrees of freedom. It was a 
random variable like this one that motivated Gosset’s search for 
the distribution of T. This f statistic will play an important role in 
statistical applications. 

EXERCISES 

4.94. Let X be the mean of a random sample of size 5 from a_normal 
distribution with ^ = 0 and a 2 = 125. Determine c so that Pr (X <c) = 
0.90. 

4.95. If X is the mean of a random sample of size n from a normal distri¬ 
bution with mean ^ and variance 100, find n so that Pr (^ — 5 < 
X<H + 5)= 0.954. 

4.96. Let X if X 2i ..., X 2S and Y u Kj, …， L be two independent random 
sam£les from two normal distributions N(Q ， 16) and ^(1, 9), respectively. 
Let A" and y denote the corresponding sample means. Compute Pr (X > Y). 

if 一 

4.97. Find the mean and variance of S 2 = — X) 2 /n, where X u 

I 

X„ is a random sample from N(n ， a 2 ). 

Hint: Find the mean and variance of nS 2 /^. 
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4.98. Let S 2 be the variance of a random sample of size 6 from the normal 
distribution N(^i, 12). Find Pr (2.30 < S 2 < 22.2). 

4.99. Find the p.d.f. of the sample variance V = S 2 y provided that the 
distribution from which the sample arises is a 2 ). 

4.100. Let X and Y be the respective means of two independent random 
samples, each of size 4, from the _two respective normal distributions 
N(\0,9) and A^(3,4). Compute Pr (X > 2Y). 

4.101. Let H … ， be a random sa mple of size n = 5 from A^(0, a 2 ), (a) 
Find the constant c so that c{X x — X 2 )/y/Xl + X] + X] has a /-distribution, 
(b) How many degrees of freedom are associated with this T? 

4.102. If a random sample of size 2 is taken from a normal distribution with 
mean 7 and variance 8, find the probability that the absolute value of the 
difference of these two observations exceeds 2. 

4.103. 'Let X and S 2 be the mean and the variance of a random sample 
of size 25 from a distribution that is N(3, 100). Then evaluate Pr (0 < A" < 6, 
55.2 <S 2 < 145.6). 

4.9 Expectations of Functions of Random Variables 

Let X U X 2 ,..., X„ denote random variables that have the joint 
p.d.f. J{x t jc„). Let the random variable Y be defined by 

Y = u{X u X 2 ,..., X n ). In Section 4.7, we found that we could 
compute expectations of functions of Y without first finding the p.d.f. 
of Y. Indeed, this fact was the basis of the moment-generating-function 
procedure for finding the p.d.f. of Y. We can take advantage of this 
fact in a number of other instances. Some illustrative examples will be 
given. 

Example 1. Say that Wis N{0, 1), that Kis x 2 ( r ) with r 艺 2, and that W 
and V are independent. The mean of the random variable T = Wy/rjv exists 
and is zero bemuse the graph of the p.d.f. of T (see Section 4.4) is symmetric 
about the vertical axis through i = 0. The variance of T, when it exists, 
could be computed by integrating the product of t 2 and the p.d.f. of T. 
But it seems much simpler to compute 

a\= E(T 2 ) = E^lV 2 ^j = E{W 2 )E 
Now W 1 is x^l), so E(IV 2 ) = 1. Furthermore, 

iy) = [ r v¥^/ 2 ~ ，e ~ D/2dv 
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exists if r > 2 and is given by 

rH(r - 2)/2] rH(r - 2)/2] r 

2r(r/2) " 2[(r - 2)/2]T[(r - 2)/2] ~ r-1' 

Thus = r/(r — 2), r > 2. 

Example 2. Let denote a random variable with mean /i, and variance 
aj, / = 1,2,...,Let X„ X 2 ,... ,X n be independent and let k t , k 2i ..., k„ 
denote real constants. We shall compute the mean and variance of a linear 

function Y = k t Xt + k 2 X 2 +- 1- k„X„. Because E is a linear operator, the 

mean of Y is given by 

〜= 肌义 + 沾 + … + ⑶ 

= k ] E(X f ) + k 2 E(X 2 )-h--. + k„E(X l ,). 

tt 

= *：_", + k 2 fi 2 + … + k„n„ = X M/- 

I 

The variance of Y is given by 

<^r = + … + k„X„) — + … + k„fi„)] 2 } 

=EUk^X, - ^) + ■ ■ ■ + k H (X„ - n„)] 2 } 

= 4t ^ - /i,) 2 + 2XZ 咏 (不一 - nj) 

i<) 

=i 衫 £1( 不 一 Zi,) 2 ] + 2 X z k^X,. - n,)(Xj - Hj)]. 

卜 I i<j 

Consider E[(X f — |i,)(A} - i < j. Because X, and X f are independent, we 
have 

E\{Xi- 从 )(A} — 妁 ) ]= E{X,- n^EiXj - Hj) = 0. 

Finally, then, 

= S ^[(^/ — ^i) 2 ] = X • 

i ® I » ™ 1 

We can obtain a more general result if, in Example 2, we remove 
the hypothesis of independence of X t ,X 2 ,, X n . We shall do this and 
we shall let p i} denote the correlation coefficient of X t and Xj. Thus for 
easy reference to Example 2, we write 

E[{X t - - fij)] * p^aj, i<j. 
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If we refer to Example 2, we see that again = ^ 々 /#/• But now 

I 

a\ = k]a) + 2 U k^p^aj. 

■ i<j 

Thus we have the following theorem. 

Theorem 5. Let X y . X„ denote random variables that have 

means /!，，••'，/!" and variances … ， Let p u , i ^ j, denote the 
correlation coefficient of X t and X』 and let k', … ， k„ denote real 
constants. The mean and the variance of the linear function 

^ = S 

I 

are, respectively, 

n 

= X 

I 

and 

4 = + 2 2 L kikjPijOiaj. 

I i<J 

The following corollary of this theorem is quite useful. 

Corollary. Let X\ f ... ,X„ denote the observations of a random 
sample of size n from a distribution that has mean /x and variance a 2 . The 

mean and the variance ofY = kjXjare, respectively, fi Y = (y, 众 , and 

<y\ = ( 零 # 

一 n 

Example 3. Let X = Y^ X“n denote the mean of a random sample of size 

I , 

n from a distribution that has mean u and variance a 1 . In accordance with 

n n 

the corollary, we have = M Z # and a\ = a 2 ^ (1/n) 2 = a 2 ln. We 

l I 

have seen, in Section 4.8, that if our sample is from a distribution that is 
N(ji, a 2 ), then X is a 2 /n). It is interesting that fix= n and = a 2 ^ 
whether the sample is or is not from a normal distribution. 

EXERCISES 

4.104. Let H X 3 , four i.i.d. random variables having the same p.d.f. 
f{x) = 2x, 0 < x < 1, zero elsewhere. Find the mean and variance of the 
sum Y of these four random variables. 
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4.105. Let and X 2 be two independent random variables so that the 
.variances of AT, and X 2 are ^ and 0 ^ = 2, respectively. Given that the 

variance of y = 3X 2 — X t is 25, find k. 

4.106. If the independent variables and have means , // 2 and variances 

al ， respectively, show that the mean and variance of the product 

Y = XfX 2 are n t n 2 and d\d{ -f- + respectively. 

4.107. Find the mean and variance of the sum Y of the observations of 
a random sample of size 5 from the distribution having p.d.f. J{x) = 
6x(1 — x), 0 < jc < 1, zero elsewhere. 

4.108. Determine the mean and variance of the mean A" of a random saitiple 

of size 9 from a distribution having p.d.f. J{x) = 4X 3 , 0 < jc < 1, zero 
elsewhere. 、 

4.109. Let X and y be random variables with = 1, /z 2 = 4, a\ = 4,a\ = 6, 
p = \. Find the mean and variaiu% of Z = 3X — 2Y. 

4.110. Let X and Y be independent random variables with means /i lt ^ and 
variances of, a\. Determine the correlation coefficient of X and Z = X — Y 
in terms of 从， /n 2 , a], a\. 

4.111. Let n and a 2 denote the mean and variance of the random variable X. 
Let y= c + bX, where b and c are real constants. Show that the mean and 
the variance of Y are, respectively, c bn and b^a 2 . 

4.112. Find the mean and the variance of Y= X { — 2X 2 + 3^ 3 , where 
X,, X 2 , are observations of a random sample from a chi-square 
distribution with 6 degrees of freedom. 

4.113. Let X and 1" be random variables such that var {X) = 4, var (y) = 2, 
and var (X + 2Y) = 15. Determine the correlation coefficient of X and Y. 

4.114. Let X and y be random variables with means 川 ， variances g\, aj; 
and correlation coefficient p. Show that the correlation coefficient of 
W = aX + b, a > 0, and Z = cY + d, c > 0, is p. 

4.115. A person rolls a die, tosse^ a coin, and draws a card from an ordinary 
deck. He receives $3 for each point up on the die, $10 for a head, $0 for 
a tail, and $1 for each spot on the card (jack =11 ， queen = 12, king = 13). 
If we assume that the three random variables involved are independent and 
uniformly distributed, compute the mean and variance of the amount to be 
received. 

4.116. Let V and V be two independent chi-square variables with r, 
and r 2 degrees of freedom, respectively. Find the mean and variance of 
F= (r 2 U)l(r t V). What restriction is needed on the parameters T| and r 2 in 
order to ensure the existence of both the mean and the variance of FI 
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4.117. Let A",, A" 2 ,..., be a random sample of size n from a distribution 
with mean /i and variance a 2 . Show that E(S 2 ) = (« — \)a 2 jn, where S 2 is 
the variance of the random sample. 

Hint: Write S 2 = (l/«) J ； (X, - n) 2 -(JP-/i) 2 . 

i 

4.118. Let X t and X 2 be independent random variables with nonzero 
variances. Find the correlation coefficient of Y — X 2 and in terms of 
the means and variances of JT, and X 2 . 

4.119. Let and X 2 have a joint distribution with parameters /i,, 

and (k Find the correlation coefficient of the linear functions 
Y = a^X t + a 2 X 2 and Z = b l X l + b 2 X 2 in terms of the real constants a ( , a 2 , 
b u b 2 , and the parameters of the distribution. 

4.120. Let A",, …，尤 be a random sample of size n from a distribution 

which has mean n and variance a 2 . Use Chebyshev's inequality to show, for 
every £ > 0, that lim Pr (\X — ^| < £) = 1; this is another form of the law 
of large numbers. ” 一 ® 

4.121. Let H and X 3 be random variables with equal variances but with 
correlation coefficients p, 2 = 0.3, p l3 = 0.5 ， and p n = 0.2. Find the 
correlation coefficient of the linear functions Y = X^-{- X 2 and 
Z = & + AV 

4.122. Find the variance of the sum of 10 random variables if each has 
variance 5 and if each pair has correlation coefficient 0.5. 

4.123. Let X and Y have the parameters 川， <^ ， a;，and p. Show that the 
correlation coefficient of X and [Y — pip 2 ja\)X\ is zero. 

4.124. Let X\ and A" 2 have a bivariate normal distribution with parameters , 

and p. Compute the means, the variances, and the correlation 
coefficient of Y\ = exp d) and Y 2 = exp (H 

//fm. Various moments of K, and Y 2 can be found by assigning 
appropriate values to t\ and t 2 in £[exp (/[A", + t 2 X 2 )]. 

4.125. Let X be N(fi ， a 2 ) and consider the transformation X = In Y or, 
equivalently, Y = e x . 

(a) Find the mean and the variance of Y by first determining £(〆）and 
Hint: Use the m.g.f. of X. 

(b) Find the p.d.f. of Y. This is the p.d.f. of the lognormal distribution. 

4.126. Let A", and X 2 have a trinomial distribution with parameters n, p\,pi- 

(a) What is the distribution of y = A", + 

(b) From the equality a\ = a\ + o\ + 2p<r,<7 2 . once again determine the 
correlation coefficient p of A", and ^ 2 . 
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4.127. Let K, = A", + X 2 and Y 2 = X 2 + where A",, X 2 , and Xj are three 
independent random variables. Find the joint m.g.f. and the correlation 
coefficient of Y i and Y 2 provided that: 

(a) Xj has a Poisson distribution with mean 从 ， i = 1, 2, 3. 

(b) X ； is N(fi h <rf), »' = 1,2, 3. 

4.128. Let A",,..., be random variables that have means and 

variances ff 2 { ,..., a 2 „. Let p u , i ^ j t denote the correlation coefficient of 
and Xj. Let a ti ... ,a„ and b', … ， b n be real constants. Show that the 

n n n n 

covariance of X = $] a,X, and Z = ^ bjXj is ^ ^ where 

i ® 1 _/ * 1 y = I /» i 

p" = 1， ！• = 1 ， 2, .. ■ ， n. 


*4.10 The Multivariate Normal Distribution 


We have studied in some detail normal distributions of one 
random variable. In this section we investigate a joint distribution 
of n random variables that will be called a multivariate normal 
distribution. This investigation assumes that the student is familiar 
with elementary matrix algebra, with real symmetric quadratic forms, 
and with orthogonal transformations. Henceforth, the expression 
quadratic form means a quadratic form in a prescribed number of 
variables whose matrix is real and symmetric- All symbols that 
represent matrices will be set in boldface type. 

Let A denote an n x « real symmetric matrix which is positive 
definite. Let |i denote the n x 1 matrix such that n'，the transpose of 
H，is n' = [^ ，只 2 , • • • ， n„], where each 从 is a real constant. Finally, let 
x denote the /i x 1 matrix such that x" = [x,, x 2 ,..., x„]. We shall 
show that if C is an appropriately chosen positive constant, the 
nonnegative function 


Ax\, X 2 , •. . ， x") = 


C exp 


(x - n)'A(x - n) 
2 


— oo < x ； < oo, / = 1 ，2 , ... ， n, 

is a joint p.d.f. of n random variables X x , X 2 , ■ • • ,X„ that are of the 
continuous type. Thus we need to show that 

严 oo 

… J[x t , x 2 ,..., x n ) dx } dx 2 ■■- dx n = 1. (1) 
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Let t denote the n x 1 matrix such that t' = [/,, t 2 ^..., t„], where 
... ,t„ are arbitrary real numbers. We shall evaluate the integral 



exp t’x — 


(x - ji)’A(x — ji) _ 


dxi " dx„, ( 2 ) 


and then we shall subsequently set ty = t 2 = • • ■ = t„ = 0, and thus 
establish Equation (1). First, we change the variables of integration in 
integral (2) from x,, x 2 ,..., to j,, ^ 2 , • • •, y„ by writing x - ji = y, 
where y" = [j,, > y„]- The Jacobian of the transformation is one 

and the n-dimensional x-spacc is mapped onto an «-dimensional 
^-space, so that integral (2) may be written as 


C exp (t’ji) 



dy' … dy„. 


⑶ 


Because the real symmetric matrix A is positive definite, the rt 
characteristic numbers (proper values, latent roots, or eigenvalues) 
a u a 2 ,... ,a„ of A are positive. There exists an appropriately chosen 
n x n real orthogonal matrix L (L r = L _l , where L _l is the inverse 
of L) such that 



for a suitable ordering of a,, a 2 ,..., a„. We shall sometimes write 
L’AL = diag [a ( , a 2 ,..., a"]. In integral (3), we shall change the 
variables of integration from y it y 2 ,..., y„to z 2 ,..., z„by writing 
y = Lz, where z' = [zi, z 2 , ..., z„]. The Jacobian of the transformation 
is the determinant of the orthogonal matrix L. Since LX = I„, where 
I" is the unit matrix of order n, we have the determinant |LX| = 1 and 
|L| 2 = 1. Thus the absolute value of the Jacobian is one. Moreover, the 
n-dimensional ^-space is mapped onto an n-dimensional z-space. The 
integral (3) becomes 




Moreover, 


exp 




i 

「 z(LAL)z~ 

=exp 

1 

L 2 J 

2 


Then integral (4) may be written as the product of n integrals in the 
following manner: 


C exp (w’L» fl 


exp I WiZ, - ) dz, 


Cexp(w’L’n) fl 


fhc 

V a, 


exp 


(W x) 


dzi 


(5) 


The integral that involves z, can be treated as the m.gX, with the more 
familiar symbol t replaced by w;，of a distribution which is iV(0,1 /a,). 
Thus the right-hand member of Equation (5) is equal to 


C exp (w’L’n) fj 




=C exp (w / L ， n ) 、 
Now, because Lr 1 = L’，we have 

(I/AL)— 1 = LA'L = diag 

Thus 


.⑹ 


a\ ai 


a n 


f S = w^A-'^w = (Lw)'A-_(Lw) = t'A-_t. 

I °i 

Moreover, the determinant |A _I | of A _, is 

|A-'| = IL^-'L)= —— ! —— • 

a_a 2 … fl” 

Accordingly, the right-hand member of Equation (6), which is equal 
to integral (2), may be written as 

VA-'t 


Ce^y/(2Tiy\A-'\ exp 


2 


⑺ 
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If, in this function, we set = r 2 = … =r” = 0, we have the value of 
the left-hand member of Equation (1). Thus we have 

cVd = i. 


Accordingly, the function 


AXy,X 2 , ... ,X n ) 




exp 


(x - n)'A(x — 卩 ) 
2 


— oo < x, < oo, / = 1, 2,..., n, is a joint p.d.f. of n random variables 
X u X 2 ,..., X„ that are of the continuous type. Such a p.d.f. is called 
a nonsingular multivariate normal p.d.f. 

We have now proved that J[x x> x 2 , ... ， jc” ）is a p.d.f. However, 
we have proved more than that. Because/(x,, jc 2 , ..., x„) is a p.d.f., 
integral (2) is the m.g.f. / 2 ,..., /„) of this joint distribution of 
probability. Since integral (2) is equal to function (7), the m.g.f. of the 
multivariate normal distribution is given by 

/ 2 ,… 乂 ） = exp 

Let the elements of the real, symmetric, and positive definite matrix 
A -1 be denoted by ij = \,2,..., n. Then 

M(0, … ， 0, 0, ... ， 0) = exp 



t|i + 


tA 


is the m.g.f. of X h i = 1 ， 2, ... ， n. Thus X { is a a), / = 1,2 . n. 

Moreover, with / ^ j, we see that M(0,..., 0, 0,..., 0,..., 0), 

the m.g.f. of X, and X h is equal to 


exp + + 


(Jut] + + a n tf 


which is the m.g.f. of a bivariate normal distribution. In Exercise 4.131 
the reader is asked to show that <r, 7 is the covariance of the random 
variables A "； and Xj. Thus the matrix, ji, where |i'=[ 〜， /i. 2 ,... ， /i”], 
is the matrix of the means of the random variables A",,..., X„. 
Moreover, the elements on the principal diagonal of A - ' are, 
respectively, the variances <t i7 = u-, i = 1,2,..., n, and the elements 
not on the principal diagonal of A -1 are, respectively, the covariances 
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(Tij = pijfT^j, i ^ j, of the random variables X t ,X 2 ,, X n . We call the 
matrix A -1 , which is given by 


<y\\ 

• • 

- ^\n 

<r 12 

<x 22 • _ 


°\n 

^2n • _ 

* • ^rm 


the covariance matrix of the multivariate normal distribution and 
henceforth we shall denote this matrix by the symbol V. In terms 
of the positive definite covariance matrix V, the multivariate normal 
p.d.f. is written 




exp 


(x- liXV'Cx - 

2 


i0~ 


— oo < ;c f < oo, 


1 ， 2,… ， n，and the m.g.f. of this distribution is given by 

exp (V|i + 〒 


for all real values of t. 

Note that this m.g.f. equals the product of n functions, where 
the first is a function of /, alone, the second is a function of t 2 alone, 
and so on, if and only if V is a diagonal matrix. This condition, 
(Tij = pijOiOj = 0, means p fj = 0, / j. That is, the multivariate normal 
random variables are independent if and only if p, y = 0 for all i ^ j. 

Example 1. Let X 2 ,..., X„ have a multivariate normal distribution 
with matrix ft of means and positive definite covariance matrix V. If we let 
X' = [X t , X 2t -.., X„], then the m.g.f. t„) of this joint distri¬ 

bution of probability is 

9 } ( 8 ) 

Consider a linear function Y of …、 X„ which is defined by 

n 

c'X = J] CiX it where c' = [c,, c 2 , ..., c„] and the several c, are real and not 

I 

all zero. We wish to find the p.d.f. of Y. The m.g.f. m(t) of the distribution 
of Y is given by 


£■(〆、=exp 


m(t) = £(〆*) = 
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Now the expectation (8) exists for all real values of t. Thus we can replace t’ 
in expectation (8) by tc r and obtain 

m(t) = exp + C ^ 

Thus the random variable Y is iV(c’|i ， c’Vc). 



EXERCISES 

4.129. Let X 2 ,..., X„ have a multivariate normal distribution with 
positive definite covariance matrix V. Prove that these random variables are 
mutually independent if and only if V is a diagonal matrix. 


4.130. Let n = 2 and take 


V 


P<f[(T2 

a] 


Determine | V|, V -1 , and (x — |i)'V _l (x — pi). Compare the bivariate normal 
p.d.f. of Section 3.5 with this multivariate normal p.d.f. when n = 2. 

4.131. Let m(t„ tj) represent the m.g.f. of and X } as given in the text. 
Show that 


0) 

dm(0, 0) 

dm(0, 0) 

Stfdtj - 

L dti J 

L 牝 J 


Ay; 


that is, prove that the covariance of X, and X } is a iy , which appears in 
that formula for m(t h tj). 

4.132. Let X t , X 2 ,... t X„ have a multivariate normal distribution, where |i 

is the matrix of the means and V is the positive definite covariance matrix. 
Let y = c’X and Z = d’X, where = [X u , X„], c' = [c,,..., c„], and 
d' = [di ,..., d„] are real matrices. ' 

(a) Find m(r,, t 2 ) = E(e ，lY+，2Z ) to see that Y and Z have a bivariate normal 
distribution. 

(b) Prove that Y and Z are independent if and only if c'Vd = 0. 

(c) If X,, X 2 ,... ,X„ are independent random variables which have the 
same variance a 2 , show that the necessary and sufficient condition of 
part (b) becomes c'd = 0. 

4.133. LetX' = [X u X 2 ,. .., X„] have the multivariate normal distribution of 

Exercise 4.132. Consider the p linear functions of X„ defined by 

W = BX, where W' = [W y ,.. ■ ， W p ], p < n, and B is a/? x « real matrix of 
rank p. Find ... ,v p ) = E(e' ly,, ), where v" is a real matrix [u l5 ..., v p ), 
to see that W t ,..., W p have a /7-variate normal distribution which has Bfi 
for the matrix of the means and BYB' for the covariance matrix. 
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4.134. Let X' = [X x , X 2 ,..., X„] have the «-variate normal distribution 
of Exercise 4.132. Show that X if X 2 ,, X p , pen, have a p-variate 
normal distribution. What submatrix of V is the covariance matrix of 
X x ,X 2 ,... y X p l 

Hint: In the m.g.f. M(/,, t 2 ,. .., t n ) of X x ,X 2 ,..., X„, let / p+ , = • •= 
= 0 . 

ADDITIONAL EXERCISES 

4.135. If A" has the p.d.f./(x)= 去， 一 1 < x <2, zero elsewhere, find the p.d.f. 
of y = JIT 4 . 

4.136. The continuous random variable X has a p.d.f. given by f{x) = 1, 
0 < x < 1, zero elsewhere. The random variable Y is such that 
Y = —2\nX. What is the distribution of y? What are the mean and the 
variance of Y1 

4.137. Let X u A " 2 be a random sample of size n = 2 from a Poisson distri¬ 
bution with mean fi. If Pr + A " 2 = 3) = (y)e~ 4 , compute Pr (X t = 2, 
X 2 = 4). 

4.^138. Let X [f X 2 ,..., X 25 be a random sample of size n = 25 from a 
distribution with p.d.f./(x) = 3/x 4 ,1 < x < oo, zero elsewhere. Let Y equal 
the number of these X values less than or equal to 2. What is the distribution 
of K? 

4.139. Find the probability that the range of a random sample of size 3 from 
the uniform distribution over the interval (— 5, 5) is less than 7. 

4.140. Let yi < K 2 < ^3 be the order statistics of a sample of size 3 from a 
distribution having p.d.f./( jc) = |, — 1 <x <2, zero elsewhere. Determine 
Pr [—4 < Y 2 < 5 ]. 

4.141. Let X and y be random variables so that Z = X —2Y has variance 
equal to 28. If = 4 and p X r = 5 » find the variance a\ of Y. 

4.142. Let l < y 2 < < y 4 be the order statistics of a random sample 

of size n = 4 from a distribution with p.d.f. /(jc) = 2(1 — x), 0 < x < 1, 
zero elsewhere. Compute Pr(F, < 0.1). 

4.143. A certain job is completed in three steps in series. The means and 
standard deviations for the steps are (in hours): 


Step Mean Standard Deviation 



Assuming normal distributions and independent steps, compute the prob¬ 
ability that the job will take less than 7.6 hours to complete, 

4.144. Let X u X 2 ,..., X„be a. random sample of size n from a distribution 
having mean n and variance 25. Use Chebyshev’s inequality to determine 
the smallest value of n so that 0.75 is a lower bound for Pr [|Jf — ii\<. 1]- 

4.145. Let X t and X 2 be independent random variables with joint p.d.f. 

Axt,x 2 ) , x t = 1, 2, 3, x 2 = 1, 2, 3, 

and zero elsewhere. Find the p.d.f. of Y = — X 2 . 

4.146. An unbiased die is cast eight independent times. Let Fbe the smallest 
of the eight numbers obtained. Find the p.d.f. of Y. 

4.147. Let X u X 2 , be i.i.d. N(n, a 2 ) and define 

y, = 

and 

Y 2 = X 2 + 6X 3 . 

(a) Find the means and variances of Y, and Y 2 and their correlation 
coefficient. 

(b) Find the joint m.g.f. of K, and Y 2 . 

4.148. The following were obtained from two sets of data: 

«i = 20, x = 25, 5 ^ = 5, 

n 2 = 30, y = 20, = 4. 

Find the mean and variance of the combined sample. 

4.149. Let < y 2 < '' • < ^5 be the order statistics of a random sample 
of size 5 from a distribution that has the p.d.f. _/(x) = 1， 0 < jc < 1， zero 
elsewhere. Compute Pr (y, 

4.150. Let M(t) = (1 — f)_ 3 , f < 1， be the m.g.f. of X. Find the m.g.f. of 
v 尤 一 10 

r = _ 25~- 

4.151. Let X be the mean of a random sample of size n from a normal 
distribution with mean 只 and variance (r 2 = 64. Find n so that 

Pr{n-6<X<n + 6) = 0.9973. 

4.152. Find the probability of obtaining a total of 14 in one toss of four dice. 
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4.153. Two independent random samples, each of size 6, are taken from two 
normal distributions having common variance a 1 . If and W 2 are the 
variances of these respective samples, find the constant k such that 

4.154. The mean and variance of 9 observations are 4 and 14, respectively. 
We find that a tenth observation equals 6. Find the mean and the variance 
of the 10 observations. 

4.155. Draw 15 cards at random and without replacement from a pack of 25 
cards numbered 1, 2, 3,..., 25. Find the probability that 10 is the median 
of the cards selected. 

4.156. Let Yi < Y 2 < Yi < Y 4 be the order statistics of a random sample of 
size n = 4 from a uniform distribution over the interval (0, 1). 

(a) Find the joint p.d.f. of Y t and Y 4 . 

(b) Determine the conditional p.d.f. of Y 2 and Y y , given Y t = y y and 

^4 = ^ 4 - 

(c) Find the joint p.d.f. of Z, = Y t /Y 4 and Z 2 = Y 4 . 

4.157. Let X 2 ,..., X„ be a. random sample from a distribution with 
mean n and variance a 2 . Consider the second differences 

Zj = Xj + 2 _ + ] + Xj, j = 1,2 ,..., n — 2. 

n — 2 

Compute the variance of the average, [ Z ; /(n — 2), of the second 

y * 1 

differences. 

4.158. Let X and Y have a bivariate normal distribution. Show that X + Y 
and X — Y are independent if and only if ^ = a\. 

4.159. Let JT be a Poisson random variable with mean n ，If the conditional 
distribution of Y, given X = x, is ^(jc, p). Show that Y has a Poisson 
distribution and is independent of A" — Y. 

4.160. Let X u X^, ■ ■ ■, X„bea. random sample from N(ji, a 2 ). Show that the 
sample mean JT and each Xj — X, i_= l, 2,... ,n, are independent. Actually 
X and the vector — X, X 2 — X,..., X„ — X) are independent and this 

— n _ 

implies that X and J] {X, — X) 2 are independent. Thus we could find the 

f = i 

joint distribution of X and nS^/a 2 using this result. 

4.161. Let X u X 2 ,..., X„ be a random sample from a distribution with 
p.d.f. / [x) = 5 , x = 1,2,...., 6, zero elsewhere. Let Y = min ( 不 ） and 
Z = max (Xi). Say that the joint distribution function of Y and Z is 
G(y, z) = Pt (Y ^ y, Z ^ z), where y and z are nonnegative integers such 
that 1 < .y < z < 6 . 



= 0 . 10 . 
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(a) Show that 

G(y, z) = F'iz)- [珂 z ) — 只州”， 1 < ^ ^ z < 6, 

where F{x) is the distribution function associated with f[x). 

Hint: Note that the event (Z ^ z) = (Y < y, Z <, z)\J (y < Y,Z <z) 

(b) Find the joint p.d.f. of Y and Z by evaluating 

giy, z) = G(y, z) - G(y - l,z) - G(y f z - l) + G(y - z - 1). 

4.162. Let X = , X 2 , X^)' have a multivariate normal distribution with 

mean vector |i = (6, —2, iy and covariance matrix 

1 0 ~\~ 

V= 0 2 1 . 

-1 1 3 

Find the joint p.d.f. of 

r, = 3^, + X 2 - 2X y and Y 2 = X x - SX^ + X y . 

4.163. If 


V= p 
P 


P P 
1 P 
P 1 


is a covariance matrix, what can be said about the value of pi 



CHAPTER 


Limiting 

Distributions 


5.1 Convergence in Distribution 

In some of the preceding chapters it has been demonstrated by 
example that the distribution of a random variable (perhaps a statistic) 
often depends upon a positive integer n. For example, if the random 
variable X is b(n, p), the distribution of X depends upon n. If X is the 
mean_pf a random sample of size n from a distribution that is N{y,, a 2 ), 
then X is itself N(ji t <r 2 jri) and the distribution of X depends upon n. If 
iS 2 is the variance of this random sample from the normal distribution 
to which we have just referred, the random variable nS^/a 2 is — 1 )， 
and so the distribution of this random variable depends upon n. 

We know from experience that the determination of the probability 
density function of a random variable can，upon occasion, present 
rather formidable computational difficulties. For example, if X is the 
mean of a random sample X 2i ... t X„ from a distribution that has 
the following p.d.f. 
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Ax) = 1 , 0 <x< 1 , 

= 0 elsewhere, 


then (Exercise 4.85) the m.g.f. of X is given by [M{tjri)\ H , where here 

〆 一 1 


M ⑺ 


e ,x dx 


t # 0, 


Hence 


1, 


E{e ,Ji ) 


0. 


v /m - r 


tjn 


/# 0 , 


1, 


Since the m.g.f. of X depends upon n, the distribution of X depends 
upon n. It is true that various mathematical techniques can be used to 
determine the p.d.f. of X for a fixed, but arbitrarily fixed, positive 
integer n. But the p.d.f. is so complicated that few, if any^of us would 
be interested in using it to compute probabilities about X. One of the 
purposes of this chapter is to provide ways of approximating, for large 
values of «, some of these complicated probability density functions. 

Consider a distribution that depends upon the positive integer rt. 
Clearly, the distribution function F of that distribution will also 
depend upon Throughout this chapter, we denote this fact by writing 
the distribution function as F„ and the corresponding p.d.f. as f„. 
Moreover, to emphasize the fact that we are working with sequences 
of distribution functions and random variables, we place a subscript 
n on the random variables. For example, we shall write 

♦jf 

KQC )= 


y/Tjny/ln 


e 


,w2/2 dw 


for the distribution function of the mean X„ of d random sample of size 
n from a normal distribution with mean zero and variance 1. 

We now define convergence iti distribution of a sequence of 
random variables. 


Definition 1. Let the distribution function F n {y) of the random 
variable Y„ depend upon rt, n= \ , 2,3,.... If F\y) is a distribution 
function and if lim F„(y) = F(y) for every point y at which /T(^) is 
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continuous, then the sequence of random variables, Y 2 ,... f 
converges in distribution to a random variable with distribution 
function P(j>). 

The following examples are illustrative of this convergence in 
distribution. 


Example 1. Let Y„ denote the nth order statistic of a random sample 
X 2 ,... ,X„ from a distribution having p.d.f. 


Ax) 


0 < x <6, 0 <6 < oo. 


elsewhere. 


The p.d.f. of Y„ is 


g„(y) 






0 <y <6, 


elsewhere. 


and the distribution function of Y„ is 

F„{y) = 0 , 少 < 0 , 



O^y <6, 


= 1 ， 6 <,y < ao. 


Then 


Now 


lim F„(y) = 0, 

n~*ao 


— oo <y <6, 
G 幺 y < oo. 


F{y) = 0 ， —co.<y<6, 

— 1, 6 <y<<x). 


is a distribution function. Moreover, lim F„(y) = F{y) at each point of 

rt-*co 

continuity of Recall that a distribution of the discrete type which has 
a probability of 1 at a single point has been called a degenerate distribution. 
Thus, in this example, the sequence of the nth order statistics, Y„, 
n = 1,2, 3,..., converges in distribution to a random variable that has a 
degenerate distribution at the point y = 6. 
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Example 2. Let X n have the distribution function 










If the change of variable v = y/nw is made, we have 

i 


K(x) 


s/^t 


e 




It is clear that 

lim F„(x) = 0, x < 0, 

n— ao 

= 5 . 又 = 0 ， 

=1, x > 0. 

Now the function 

F(x) = 0, 3c < 0, 

= 1 ， x 2 0， 

is a distribution function and lim F m (^c) = /l[3c)at every point of continuity of 

H-+CD 

P(x). To be sure, lim /1(0), but / ^x) is not continuous at 3c = 0. 

B-^OD — 虹一 

Accordingly, the sequence X x , X 2 , X^... converges in distribution to a 
random variable that has a degenerate distribution at x = 0 . 

Example 3. Even if a sequence X x , X 2 , Xj,... converges in distribution to 
a random variable X, we cannot in general determine the distribution of ^by 
taking the limit of the p.d.f. of X„. This is illustrated by letting X n have the 
p.d.f. 

fn(x) = 1 , X = 2 + ji ， 


= 0 elsewhere. 

Clearly, lim /„(x) = 0 for all values of x. This may suggest that X„, 

If—QO 

/i = 1, 2, 3,, does not converge in distribution. However, the distribution 
function of'A^ is 

F„(x) = 0 , x<2 + ^. 


1, 


弋 
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and 


Since 


lim F,(x) = 0, x<2, 

—1, x > 2. 

m = o, x<2, 

= 1 ， x 


is a distribution function, and since lim F„(x) = F{x) at all points of 

continuity of F{x), the sequence X ] , X 2 , X 3 ,... converges in distribution to 
a random variable with distribution function F{x). 


It is interesting to note that although we refer to a sequence of 
random variables, X t , X 2 , , converging in distribution to a 

random variable Shaving some distribution function i^x),itis actually 
the distribution functions , F 2 , F 3 ,... that converge. That is, 

lim F„(^) = f(x) 

at all points jc for which is continuous. For that reason we often 
find it convenient to refer to F{x) as the limiting distribution. Moreover, 
it is then a little easier to say that X„, representing the sequence 
X u X Zy X 3 ,. .., has a limiting distribution with distribution function 
F{x). Henceforth, we use this terminology. 

Example 4. Let Y„ denote the /ith order statistic of a random sample from 
the uniform distribution of Example 1. Let Z„ = n{d — Y„). The p.d.f. of Z„ 
is 

(6-z/ny- 1 n 

* h„(z) = - q-„ - , 0<z<nd, 

= 0 elsewhere, 

and the distribution function of Z„ is 
G m (z) = 0, z < 0 ， 

r (e-w/ny- 


磊)， 


0 < z <nd, 


= 1 ， nd < z. 
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Hence 


Now 


lim G”(z) = 0 ， z < 0, 

fl-*QO 

=1 — e~ x/e , 0 < z < oo. 

G(z) = 0, z < 0, 

=1 - e~ z,e , 0 < 2 , 


is a distribution function that is everywhere continuous and lim G n (z) = G{z) 

«-»Q 0 

at all points. Thus Z” has a limiting distribution with distribution function 
G{z). This affords us an example of a limiting distribution that is not 
degenerate. 

Example 5. Let T„ have a /-distribution with n degrees of freedom, 
n = 1, 2, 3,.... Thus its distribution function is 

产 ， r [(” +1)/2] i 


FM) 


^ nYW ) (1+^/”广 + 收 
where the integrand is the p.d.f./„(>>) of T„. Accordingly, 


dy. 


lim F„(t) = lim 

ff-»00 n-KJC 


L(y) dy 


= f„(y) dy. 

OO 

^ — 00 

The change of the order of the limit and integration is justified because [/^( 少 )| 
is dominated by a function, like 10/, (_p), with a finite integral. That is, 

\fn(y)\ < \0My) 

and 

*’ 10 

IO/iOO = — arctan t < oo. 


for all real t. Hence, here we can find the limiting distribution by finding the 
limit of the p.d.f. of T„. It is 

= lim 

n-*oo w—»oo 


r [(” +1)/2] 

V^r_ 
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Using the fact from elementary calculus that 

、 n j 

the limit associated with the third factor is clearly the p.d.f. of the standard 
normal distribution. The second limit obviously equals 1. If we knew more 
about the gamma function, it is easy to show that the first limit also 
equals 1. Thus we have 


lim F„(t )= 

n-*ao 


1 


办， 


and hence T„ has a limiting standard normal distribution. 


EXERCISES 

, • ? 

5.1. Let X n denote the mean of a random sample of sizen from a distribution 
that is N(n, a 2 ). Find the limiting distribution of X„. 

5.2. Let K, denote the first order statistic of a random sample of size n from 
a distribution that has the p.d.f. /(jc) == e~ (x ~ 9) ,0 < x < oo, zero elsewhere. 
Let Z„ = n(Y t ~ 6). Investigate the limiting distribution of Z„. 

5.3. Let Y„ denote the nth order statistic of a random sample from a 
distribution of the continuous type that has distribution function F(x) 
and p.d.f./( jc) = F(jc). Find the limiting distribution of Z„ = «[1 — F{ y B )]. 

5.4. Let Y 2 denote the second order statistic of a random sample of size n 
from a distribution of the continuous type that has distribution function 
/(jc) and p.d.f./( jc)= 厂 (jc). Find the limiting distribution of W„ = nf{ Y 2 ). 

5.5. Let the p.d.f. of Y„ be f„(y) = l,y = n, zero elsewhere. Show that Y„ does 
not have a limiting distribution. (In this case, the probability has “escaped” 
to infinity.) 

5.6. Let X Xi X 2 ,..., X H be & random sample of size n from a distribution that 

is ff 2 )，where a 1 >0. Show that the sum Z„ = J] X ( does not have a 
limiting distribution. 1 

5.2 Convergence In Probability 

In the discussion concerning convergence in distribution, it 

was noted that it was really the sequence of distribution functions 

that converges to what we call a limiting distribution function. 
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Convergence in probability is quite different, although we demonstrate 
that in a special case there is a relationship between the two concepts. 

Definition 2. A sequence of random variables m … 
converges in probability to a random variable X if, for every e > 0, 

lim PrdA；-Jkl <e)= 1, 

»-*oo 

or equivalently, 

— lim Pr (|^„ - Jkl > £) = 0. 

»oo 

Statisticians are usually interested in this convergence when the 
random variable I is a constant, that is, when the random variable X 
has a degenerate distribution at that constant. Hence we concentrate 
on that situation. 

Example 1. Let X n denote the mean of a random sample of size n from 
a distribution that has mean n and positive variance a 2 . Then the mean and 
variance of X„ are n and (p-jn. Consider, for every fixed € > 0, the probability 

Pr(| 足一 = 

where k = ty/nja. In accordance with the inequality of Chebyshev, this 
probability is less than or equal to 1/P = ^/ne 2 . So, for every fixed £ > 0, we 
have 

lim Pr (\X„ — a<| ^ £> < lim = 0. 

«-*oo «-»oo fit 

Hence X„, /» = 1 ， 2, 3, ... ， converges in probability to n if a 1 is finite. (In a 
more advanced course, the student will learn that 只 finite is sufficient to ensure 
this convergence in probability.) This result is called the weak law of large 
numbers. 

Remark. A stronger type of convergence is given by Pr (lim Y„ = c)= 1; 

»-*oo 

in this case we say that Y„, n = 1, 2, 3,..., converges to c with probability 1. 
Although we do not consider this type of convergence, it is known that the 
mean X n , n = 1, 2, 3,..., of a random sample converges with probability 1 
to the mean fi of the distribution, provided that the latter exists. This is one 
form of the strong law of large numbers. 

We prove a theorem that relates a certain limiting distribution to 
convergence in probability to a constant. 
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Theorem 1. Let F„(y) denote the distribution function of a random 
variable Y„ whose distribution depends upon the positive integer n. Let 
c denote a constant which does not depend upon rt. The sequence 
n = 1 ， 2, 3, ... ， converges in probability to the constant c if and only if 
the limiting distribution of Y„ is degenerate at y = c. 

Proof. First, assume that the lim Pr (|y n — c\ <£)=1 for every 
€ > 0. We are to prove that the random variable Y„ is such that 

lim F„(y) = 0, y < c, 

n-*cc 

=1, y> c. 

Note that we do not need to know anything about the lim F„{c). For 

n-* oo 

if the limit of is as indicated, then Y n has a limiting distribution 
with distribution function 

f{y) = 0, y<c, 

=1, y-^c. 

Now 

Pr (I n - c| < £) = F„[(c + £)-]-f ； (c-£), 

where F„[(c + e) — ] is the left-hand limit of ( 少 ） at j = c + £• Thus 
we have 

1 = lim Pr (|y” -c\<£) = lim F n [(c + £) — ] - lim F n (c - e). 

n—* go u —*oo n—* oo 

Because 0 ^ < 1 for all values of y and for every positive integer 

n, it must be that 

lim F n (c -£) = 0 ， lim F„[(c + £)-]= 1. 

/!-»oo n-* oo 

Since this is true for every c > 0, we have 

lim ( 少 ）= 0 ， y<c, 

/I—00 

=I, y>c, 

as we were required to show. 
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To complete the proof of Theorem 1, we assume that 

lim F„(y) = 0, y<c, 

n-^OQ 

=1 ， y> c. 


We are to prove that lim Pr (| — c| < t) = 1 for every £ > 0. Because 


n-MD 


Pr (I Y„-c\<e) = F n [{c + £>-]- F n (c - £>, 


and because it is given that 


lim F h [{c + £)—] 


lim F„(c -£) = 0 ， 

for every £ > 0, we have the desired result. This completes the proof 
of the theorem. 


For convenience, in the notation of Theorem 1， we sometimes say 
that Y„, rather than the sequence Y\, Y 2 , F 3 ,..., converges in 
probability to the constant c. 


EXERCISES 

5.7. Let the random variable Y„ have a distribution that is b(n, p). 

(a) Prove that YJn converges in probability to p. This result is one form 
of the weak law of large numbers. 

(b) Prove that 1 — YJn converges in probability to 1 — 尸 . 

5.8. Let denote the variance of a random sample of size n from a 
distribution that is iV(〆 ， a 2 ). Prove that nS 2 J(n — 1) converges in probability 

tO (T 2 . 

5.9. Let W n denote a random variable with mean 〆 and variance bjff, where 
p >0 t fi, and b are constants (not functions of n). Prove that W n converges 
in probability to fi. 

Hint: Use Chebyshev’s inequality. 

5.10. Let Y„ denote the «th order statistic of a random sample of size n from 
a uniform distribution on the interval (0, 0), as in Example 1 of Section 5.1. 
Prove that Z„ = converges in probability to yjd. 
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5.3 Limiting Moment-Generating Functions 

To find the limiting distribution function of a random variable Y„ 
by use of the definition of limiting distribution function obviously 
requires that we know F„(y) for each positive integer n. But, as 
indicated in the introductory remarks of Section 5.1, this is precisely 
the problem we should like to avoid. If it exists, the moment-generating 
function that corresponds to the distribution function 尸 ”( 少 ） often 
provides a convenient method of determining the limiting distribution 
function. To emphasize that the distribution of a random variable Y„ 
depends upon the positive integer «, in this chapter we shall write the 
moment-generating function of Y„ in the form M(t; n). 

The following theorem, which is essentially Curtiss' modification 
of a theorem of Levy and Cramer, explains how the moment-generat¬ 
ing function may be used in problems of limiting distributions. A proof 
of the theorem requires a knowledge of that same facet of analysis that 
permitted us to assert that a moment-generating function, when it 
exists, uniquely determines a distribution. Accordingly, no proof of the 
theorem will be given. 


Theorem 2. Let the random variable Y„ have the distribution 
function F n {y) and the moment-generating function M{t\ n) that exists 
for —h < t < h for all n. If there exists a distribution function F^y), 
with corresponding moment-generating function M(t) y defined for 
|/| < hi < h, such that lim M(r, n) = M(t), then Y„ has a limiting 

n -^00 j : 

distribution with distribution function F{y). 


In this and the subsequent sections are several illustrations of the 
use of Theorem 2. In some of these examples it is convenient to use a 
certain limit that is established in some courses in advanced calculus. 
We refer to a limit of the form 


lim 



b . Hn) 

- 1 - 

n n 


where b and c do not depend upon n and where lim \{/(n) = 0. Then 
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For example. 


lim " -菩、 


— n/2 


3/2 


lim 

n-^oo 


\一~翁 /2 . 




Here b = -^r 2 , c = and = ^j^/n. Accordingly, for every fixed 
value of t, the limit is e ,lfl . 


Example 1. Let Y„ have a distribution that is b{n, p). Suppose th^t the 
mean n = np is the same for every n; that is, p = nfn, where is a constant. 
We shall find the limiting distribution of the binomial distribution, when 
p = nfn, by finding the limit of M(t; n). Now 

「 Hie 1 - 1)T 

M(t; n) = E(e' r ") = [(1 - p) + 〆]” =1 + — 

for all real values of t. Hence we have 

lim M(t; = 

rt-*ao 

for all real values of t. Since there exists a distribution, namely the Poisson 
distribution with mean n, that has this m.g.f. e tKe> ~ 1) , then, in accordance with 
the theorem and under the conditions stated, it is seen that Y„ has a limiting 
Poisson distribution with mean n ， 

Whenever a random variable has a limiting distribution, we may, if we 
wish, use the limiting distribution as an approximation to the exact 
distribution function. The result of this example enables us to use the Poisson 
distribution as an approximation to the binomial distribution when n is large 
and p is small. This is clearly an advantage, for it is easy to provide tables for 
the one-parameter Poisson distribution. On the other hand, the binomial 
distribution has two parameters, and tables for this distribution are very 
ungainly. To illustrate the use of the approximation, let Y have a binomial 
distribution with w = 50 and p = ^ • Then 

Pr(K<l) = (§) J0 + 50( 去 )( 劈 ) 49 = 0.400, 

approximately. Since n = np = 2, the Poisson approximation to this prob¬ 
ability is 

e— 2 + 2e~ 2 = 0.406. 


Example 2. LetZ n be Then the m.g.f. of Z” is (1 — 2t)~" 12 , t < The 
mean and the variance of Z„ are, respectively, n and In. The limiting 
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distribution of the random variable F" 〒（ Z” 一 n)/y/2rt will be investigated. 
Now the m.g.f. of Y„ is 


M(t; n) = E< exp 


、 y/^n J 




exp 


HJ )®]( 1 _ 2 为)’ 


t < 


y/2n 


This may be written in the form 


M(r ； n) = - t ^, 


< 


In a< 
and 


iCcordani 

ty/Vn, 


ce with Taylor's formula, there exists a number f(n), between 0 
such that 


e ,y ^" = 1 + r 


H^) 2 



n /2 

n k 


If this sum is substituted for e 1 ^ in the last expression for M{t\ n), it is seen 
that 

m , n)= U_t + mY\ 


where 


少⑻ 


扣 e ⑽扭 2rV<"> 


3rt 


3^/n y/n 

Since《(”)—0 as « — oo，then lim \jf{n) = 0 for every fixed value of t. In 
accordance with the limit proposition cited earlier in this section, we have 

lim M(t; n) = e' 2f2 

rt-»co 

for all real values of t. That is, the random variable Y„ = (Z„ — n)jy/2n has 
a limiting standard normal distribution. 


EXERCISES 

5.11. Let X„ have a gamma distribution with parameter a = n and where 
P is not a function of n. Let Y„ = X n jn. Find the limiting distribution of Y„. 

5.12. Let Z n be x\ n ) and let W„ = Z n jr^. Find the limiting distribution of W„. 
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5.13. Let X be Z 2 (50). Approximate Pr (40 < X < 60). 

5.14. Let p = 0.95 be the probability that a man, in a certain age group, lives 
at least 5 years. 

(a) If we are to observe 60 such men and if we assume independence, find 
the probability that at least 56 of them live 5 or more years. 

(b) Find an approximation to the result of part (a) by using the Poisson 
distribution. 

Hint: Redefine p to be 0.05 and \ — p = 0.95. 

5.15. Let the random variable Z„ have a Poisson distribution with par¬ 
ameter n = n. Show that the limiting distribution of the random variable 
Y„ = (Z„ — n)/y/n is normal with mean zero and variance 1. 

5.16. Let S 2 „ denote the variance of a random sample of size n from a 
distribution that is N(n, a 2 ). It has been proved that nS 2 J(n — 1) converges 
in probability to a 1 . Prove that S\ converges in probability to a 1 . 

5.17. Let X„ and Y n have a bivariate normal distribution with parameters fi x , 
H 2 , <rf, a\ (free of «) but p == \ — \jn. Consider the conditional distribution 
of Y„, given X„ = x. Investigate the limit of this conditional distribution as 
n-*oo. What is the limiting distribution if p = — 1 + 1 /«? Reference to 
these facts was made in the Remark in Section 2.3. 

5.18. Let X„ denote the mean of a random sample of size n from a Poisson 

distribution with parameter n = l. _ _ 

(a) Show that the m.g.f. of Y„ = y/n(X„ - n)/a = y/n(X„ - 1) is given by 
cxpi-t^ + nie' 1 ^'- 1)]. 

(b) Investigate the limiting distribution of Y„ as «-»c». 

Hint: Replace, by its MacLaurin’s series, the expression e' 1 ^", which is 
in the exponent of the moment-generating function of Y n . 

5.19. Let denote the mean of a random sample of size n from a distribution 
that has p.d.f. /(x) = e~ x , 0 < x < co, zero elsewhere. 

(a) Show that the m.g.f. M{t\ n) of Y„ = y/n(X„ — 1) is equal to 

[e' ly ^ — t < ^/n. 

(b) Find the limiting distribution of Y„ as «-»oo. 

This exercise and the immediately preceding one are special instances 
of an important theorem that will be proved in the next section. 

5.4 The Central Limit Theorem 

It was seen (Section 4.8) that, if X t , X 2 ^..., X„ is a random sample 
from a normal distribution with mean " and variance <r 2 , the random 
variable 

YXi — n/x r - 
^__ sln{X n - n) 
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is, for every positive integer «, normally distributed with zero mean and 
unit variance. In probability theory there is a very elegant theorem 
called the central limit theorem. A special case of this theorem asserts 
the remarkable and important fact that if X, ,X 2y , X„ denote the 
observations of a random sample of size /i from any distribution having 
positive variance a 2 (and hence finite mean n), then the random variable 
y/n(X„ — n)/(T has a limiting standard normal distribution. If this 
fact can be established, it will imply, whenever the conditions of the 
theorem are satisfied, that (for large n) the random variable 
y/n(X — n)/a has an approximate normal distribution with mean zero 
and variance 1. It will then be possible to use this approximate normal 


distribution to compute approximate probabilities concerning X. 

The more general form of the theorem is stated, but it is proved only 
in the modified case. However, this is exactly the proof of the theorem 
that would be given if we could use the characteristic function in place 
of the m.g.f. 


Theorem 3. Let X 2y ..., X n denote the observations of a random 
sample from a distribution that has mean fi and positive variance a 1 . Then 

the random variable Y„ : =(* 足 一 n")/ y/na = y/n(X„ - n)/a has a 

limiting distribution that is normal with mean zero and variance 1. 

Proof. In the modification of the proof, we assume the existence of 
the m.g.f. M{t) = E(e lX ), —h<t<h, of the distribution. However, 
this proof is essentially the same one that would be given for this 
theorem in a more advanced course by replacing the m.g.f. by the 
characteristic function q>{t) = E(e ilX ). 

The function 

m{t) = E[e ,(X ~^] = 

also exists for —h < t <h. Since m{t) is the m.g.f. for X — fi, it 
must follow that m(0) = 1, m\0)= 五 (1 — p) = 0， and m"(0)= 
E[(x-^y] = : a 2 . By Taylor’s formula there exists a number ^ between 
0 and t such that 

m'\Ot 2 

m{t) — m(0) + tn\0)t H - ^~ 
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If a 2 ^!2 is added and subtracted，then 

Next consider Af(/; n), where 


M{t\ n) = E exp 


E exp 


(Z ~ n ^\ 

，(错 XP fe) 


… exp 


「 (x x - n\i r / a 

=E exp I t —- I • • E exp I t — I 

_ \ / J L \ / _ 

={{ e 4 _]} 

= ， fe ) 了， -々 <h . 

In Equation (1) replace t by t\oyJn to obtain 


X n ~li 

、 4 y 


{☆) 




where now ^ is between 0 and tjay/n with —h<Ty/n < t < hajn. 
Accordingly, 




Since m " ⑺ is continuous at / = 0 and since $ — 0 as w — oo, we have 

lim [m w (^) — c 2 ] = 0. 

n—00 

The limit proposition cited in Section 5.3 shows that 

lim Af(/; n) = e ,2/2 

«-*Q0 

for aU real values of t. This proves that the random variable Y n = 
y/n(X„ — /i)/<7 has a limiting standard normal distribution. 

We interpret this theorem as saying that, when n is a large, fixed 
positive integer, the random variable X has an approximate norma! 
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distribution with mean \i and variance ff 2 /n; and in applications we use 
the approximate normal p.d.f. as though it were the exact p.d.f. of X. 

Some illustrative examples, here and later, will help show the 
importance of this version of the central limit theorem. 

Example 1. Let X denote the mean of a random sample of size 75 from 
the distribution that has the p.d.f. 

f(x) = 1 , 0 < x < 1 , 

= 0 elsewhere. 

It was stated in Section 5.1 that the exact p.d.f. of X, say g(.x), is rather 
complicated. It can be shown that g(x) has a graph when 0 < jc < 1 that is 
composed of arcs of 75 different polynomials of degree 74. The computation 
of such a probability as Pr(0.45 <X < 0.55) would be extremely laborious. 
The conditions of the theorem are satisfied, since M(t) exists for all real values 
of t. Moreover, n = \ and 02 so that we have approximately 

~^«(0.45 - n) y/n(X - fx) y/n(0.55 - n)~ 
a a a 

=Pr [— 1.5 < 30(^- 0.5) < 1.5] 

= 0 . 866 , 

from Table III in Appendix B. 

Example 2. Let X 2 ,..., X„ denote a random sample from a 
distribution that is b(l,p). Here n = p, (J 2 = p{l — p), and M{t) exists for 
all real values of t. If V„ = X, + ■ • • + X„, it is known that Y„ is b(n, p). 
Calculation of probabilities concerning Y„, when we do not use the Poisson 
approximati on, can be greatly simplified by making use of the fact that 

- np)j^Jnp{\ - p) = y/n(X„- p)jyjp(\ -p) = Jn{X„ - n)ja has a lim¬ 
iting distribution that is normal with mean zero and variance 1. Frequently, 
statisticians say that Y„, or more simply Y, has an approximate normal 
distribution with mean np and variance np(l — p). Even with n as small as 10, 
with /» = 5 so that the binomial distribution is symmetric about np = 5, we 
note in Figure 5.1 how well the normal distribution, N(5, |), fits the binomial 
distribution, ^( 10 , ^), where the heights of the rectangles represent the 
probabilities of the respective integers 0, 1 , 2, ..., 10. Note that the area of 
the rectangle whose base is (k — 0.5, k + 0.5) and the area under the normal 
p.d.f. between k — 0.5 and k + 0.5 are approximately equal for each 
众 = 0, 1 ， 2, •. • ， 10, even with n = 10. This example should help the reader 
understand Example 3. 


Pr (0.45 < JP < 0.55) - Pr 




FIGURE 5.1 


Example 3. With the background of Example 2, let n = 100 and p 
and suppose that we wish to compute Pr (y = 48,49, 50, 51, 52). Since y is 
a random variable of the discrete type, the events Y = 48, 49, 50, 51, 52 
and 47.5 < Y < 52.5 are equivalent. That is, Pr (y = 48, 49, 50, 51 ， 52)= 
Pr (47.5 < Y < 52.5). Since np = 50 and np(\ — p) = 25, the latter prob¬ 
ability may be written 


Pr(47.5 < Y< 52.5)= 






Since {Y — 50)/5 has an approximate normal distribution with mean zero and 
variance 1, Table III shows this probability to be approximately 0.382. 

The convention of selecting the event 47-5 < Y < 52.5, instead of, say, 
47.8 < Y < 52.3, as the event equivalent to the event Y = 48, 49, 50, 51, 52 
seems to have originated in the following manner: The probability, 
Pr (y = 48, 49, 50, 51, 52), can be interpreted as the sum of five rectangular 
areas where the rectangles have bases 1 but the heights are, respectively, 
Pr (y = 48),..., Pr (K = 52). If these rectangles are so located that the 
midpoints of their bases are, respectively, at the points 48,49,..., 52 on a 
horizontal axis, then in approximating the sum of these areas by an area 
bounded by the horizontal axis, the graph of a normal p.d.f., and two 
ordinates, it seems reasonable to take the two ordinates at the points 47.5 and 
52.5. 


_ n 

We know that ^ and I X, have approximate normal distributions, 

/ = I 

provided that n is large enough. Later, we find that other statistics 
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also have approximate normal distributions, and this is the reason that 
the normal distribution is so important to statisticians ： That is, while 
not many underlying distributions are normal, the distributions 
of statistics calculated from random samples arising from these 
distributions are often very close to being normal. 

Frequently, we are interested in functions of statistics that have 
approximate normal distributions. For illustration, Y„ of Example 2 
has an approximate N[np, np{\ — /?)]. So np{\ — p) is an important 
function of p as it is the variance of Y„. Thus, if p is unknown, we might 
want to estimate the variance of Y„. Since E(YJn) = p, we might use 
n(Y„/n)(\ — Y„/n) as such an estimator and would want to know 
something about the latter’s distribution. In particular, does it also 
have an approximate normal distribution? If so, what are its mean and 
variance? To answer questions like these, we use a procedure that is 
commonly called the delta method, which will be explained using the 
sample mean X„ as the statistic. 

We know that X„ converges in probability to n and X„ is 
approximately N(fi, <r 2 /n). Suppose_that we are interested in a function 
of X ni say u(X n ). Since, for large n, X„ is close to p，we can approximate 
u(X„) by the first two terms of Taylor’s expansion about n ，namely 

u(X„) « v(X„) = u(n) + (X„ - 

where u\pi) exists and is not zero. Since v{X n ) is a linear function of X n , 
it has an approximate normal distribution with mean 

E\p{X n )] = u(n) + E[{X n - n)]u\n) = u(fx) 

and variance 

var b ( 兄 ) ]= WUi)] 2 var (X„ - ") = [u\n)] 2 — . 

Now, for large n, u{X„) is approximately equal to v(X„); so it has the 
same approximating distribution. That is, u(X„) is approximately 
N{u(pi), [uXfi)] 2 (i 2 /n}. More formally, we could say that 

u{X n ) - u(ji) 

JWUi)}^ln 

has a limiting standard normal distribution. 


Example 4. Let Y„ (or Y for simplicity) be b{n, p). Thus F/n is approxi¬ 
mately N[p, p{l — p)/n)]. Statisticians often look for functions of statistics 
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whose variances do not depend upon the parameter. Here the variance of Yjn 
depends upon p. Can we find a function, say u( Y/n), whose variance is 
essentially free of pi Since Yjn converges in probability to p, we 
can approximate u( Yjn) by the first two terms of its Taylor’s expansion about 
p, namely by 

f) = w(P) + P^u'ip). 



Of course, v( Yjn) is a linear function of Yjn and thus also has an approximate 
normal distribution; clearly, it has mean u{p) and variance . 


[u\p)] 


Pi\ ~P) 


n 


But it is the latter that we want to be essentially free of p; thus we set it equal 
to a constant, obtaining the differential equation 

«'(/?) = C . 
yjpi} ~P) 

A solution of this is 

u(p) = ( 2 c)arcsin J~p, 

If we take c = 5 , we have, since u{ Yjn) is approximately equal to v( Yjn), that 



This has an approximate normal distribution with mean arcsin J~p and 
variance l/4n, which is free of p. 


EXERCISES 

5.20. Let X denote the mean of a random sample of size 100 from a distri¬ 
bution that is x 2 (50). Compute an approximate value of Pr (49 < X < 51). 

5^1. Let X denote the mean of a random sample of size 128_from a gamma 
distribution with a = 2 and P — A. Approximate Pr (7 < A" < 9). 

5.22. Let ybe b(12, 5 ). Approximate Pr (22 < y < 28). 

5«23. Compute an approximate probability that the mean of a random sample 
of size 15 from a distribution having p.d.f. f{x) = 3X 2 , 0 < jc < 1, zero 
elsewhere, is between | and 

5.24. Let Y denote the sum of the observations of a random sample of size 
12 from a distribution having p.d.f. f(x) = j, x = 1,2, 3,4, 5, 6 , zero 
elsewhere. Compute an approximate value of Pr (36 < Y < 48). 
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Hint: Since the event of interest is K = 36, 37,, 48, rewrite the 
probability as Pr (35.5 < Y < 48.5). 

5.25. Let Y be ft(400, ^). Compute an approximate value of Pr (0.25 < Yjn). 

5.26. If Y is ft(100, 5 ), approximate the value of Pr (K = 50). 

5*27. Let y be b(n, 0.55). Find the smallest value of n so that (approximately) 
Pr(r/«>^)>0.95. 

5.28. Let fix) = 1/jc 2 , 1 < x < 00 , zero elsewhere, be the p.d.f. of a random 
variable X. Consider a random sample of size 72 from the distribution 
having this p.d.f. Compute approximately the probability that more than 
50 of the observations of the random sample are less than 3. 

5.29. Forty-eight measurements are recorded to several decimal places. Each 
of these 48 numbers is rounded off to the nearest integer. The sum of the 
original 48 numbers is approximated by the sum of these integers. If we 
assume that the errors made by rounding off are i.i.d. and have uniform 
distributions over the interval (—|, 5), compute approximately the 
probability that the sum of the integers is within 2 units of the true sum. 

5.30. We know that X is approximately N(ji, (r 2 /n) for large n. Find the 
approximate distribution of u(X) = X i . 

5.31. Let Xi, X 2 ,..., X„ be a. random sample from a Poisson distribution 

n 

with mean n ，Thus Y = Y, has a Poisson distribution with mean nn ， 

1-1 

Moreover, X = Yjn is approximately N(n, uln) for large n. Show that 
u(Yjn) = y/Y/n is a function of Y/n whose variance is essentially free of "• 

5.5 Some Theorems on Limiting Distributions 

In this section, we shall present some theorems that can often be 
used to simplify the study of certain limiting distributions. 

Theorem 4. Let F„(u) denote the distribution function of a random 
variable U„ whose distribution depends upon the positive integer n. Let 
U„ converge in probability to the constant c # 0. The random variable 
U„/c converges in probability to 1. 

The proof of this theorem is very easy and is left as an exercise. 

Theorem 5. Let F„(u) denote the distribution function of a random 
variable U„ whose distribution depends upon the positive integer n. 
Further, let U n converge in probability 4o the positive constant c and let 





Limititig Dhtribvtions [Ch. 5 


Pr (U„ < 0) = 0 for every n. The random variable y/lJ„ converges in 
probability to yfc. 

Proof. We are given that the lim Pr (\U„ — c| > e) = 0 for every 
c>0. … * 

We are to prove that the lim Pr — ^/c\ > £ y ) = 0 for every 
e' > 0. Now the probability 卜 00 

Pr(|l/ n - c| > e) = Pr [|(^ - v^Kv^ + s/c)\>e] 




7 ^) 


If we let t' = ef^/c, and if we take the limit, as n becomes infinite, we 
have 

0 = lim Pr(|t/„ - c| > c) > lim Pr (Iv^ - Jc\ > e') = 0 

w-^oo n-* ao 

for every e' > 0. This completes the proof. 


The conclusions of Theorems 4 and 5 are very natural ones and 
they certainly appeal to our intuition. There are many other theorems 
of this flavor in probability theory. As exercises, it is to be shown that 
if the random variables U n and V„ converge in probability to the 
respective constants c and d, then U„ V n converges in probability to the 
constant cd, and U„IV n converges in probability to the constant cjd, 
provided that d 关 0. However, we shall accept, without proof, the 
following theorem, which is a modification of Slutsky’s theorem. 


Theorem 6 . Let F„(u) denote the distribution function of a random 
variable U„ whose distribution depends upon the positive integer n. Let 
U n have a limiting distribution with distribution function F{u). Let a 
random variable V„ converge in probability to 1. The limiting distribution 
of the random variable W n = VJV n is the same as that of U n ; that is, W„ 
has a limiting distribution with distribution function 

Example 1. Let Y„ denote a random variable that is b(n, p\0 <p < 1. We 
know that 


Y„-np 
^/np{\ -p) 
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has a limiting distribution that is N(0, 1). Moreover, it has been proved that 
Y„jn and 1 — Y„/n converge in probability to p and 1 — p, respectively; thus 
(Y„/n)(\ — YJn) converges in probability to/^1 — p). Then, by Theorem 4, 
(Y„/n)(\ — Y„/n)/[p(] — p)] converges in probability to 1, and Theorem 5 
asserts that the following does also: 


r , - >»T 2 

k = [~~k^pT~\ . 

Thus, in accordance with Theorem 6 , the ratio W n = UJV n , namely 

y„-np 

^n(YJn)(\ - YJn) 

has a limiting distribution that is iV(0, 1). This fact enables us to write (with 
n a large, fixed positive integer) 



Y — np 

^/rt(Y/n)(\ - Y/n) 


<2 


= 0.954, 


approximately. 


Example 2. Let X„ and 5^ denote, respectively, the mean and the variance 
of a random sample of size n from a distribution that is N(n ， <r 2 ), a 2 > 0. It 
has been proved that X„ converges in probability to /i and that S 2 „ converges 
in probability to a 1 . Theorem 5 asserts that S„ converges in probability to cr 
and Theorem 4 tells us that S„/a converges in probability to 1. In accordance 
with Theorem 6 , the random variable W„ = oX„jS„ has the same limiting 
distribution as does X n . That is, (tX„/S„ converges in probability to fi. 


EXERCISES 

5.32. Prove Theorem 4. 

Hint: Note that Pr (\UJc — 1| < e) = Pr (\U„ — d < fM), for every 
e > 0. Then take e' = e|c|. 

5.33. Let X„ denote the mean of a random sample of size n from a gamma 
distribution with parameters a. = pi> 0 and )5 = 1. Show that the limit¬ 
ing distribution of y/n( X„ — n)j s jY„ is N(0, 1). 

5.34. Let T„ = {X n — pi)jyjs\j(n — 1), where X„ and S 2 n represent, respect¬ 
ively, the mean and the variance of a random sample of size n from a 
distribution that is N(ji, a 1 ). Prove that the limiting distribution of T„ 
is N(0, 1). 

5.35. Let X,,... ,X n and Y u , Y„ be the observations of two independent 
random samples, each of size n, from the distributions that have the 
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respective means 叫 and /i 2 and the common variance a 2 . Find the limiting 
distribution of 

— 匕 ） —(Mi ~ f^i) 

where X„ and Y„ are the respective means of the samples. 

Hint: Let ^„ = X where Z, = X, — Y h 

I 

5.36. Let U„ and V„ converge in probability to c and 忒 respectively. Prove the 
following. 

(a) The sum U„ + V„ converges in probability to c + d. 

Hint: Show that Pr (| U„ + V„ - c — d\ > t) < Pr (|t/ fl — c\-\ - \ V n — d\ 
^£)<Pr(|t/„-c|>£/2 or \V„-d\^ f /2) < Pr (| U n -c\^ e/2) + 
Pr (I V n -d\> £ /2). 

(b) The product U„V„ converges in probability to cd. 

(c) 1( d ^0, the ratio U„/V„ converges in probability to cjd. 

5.37. Let U„ converge in probability to c. If h(u) is a continuous function at 
u = c, prove that h(U„) converges in probability to h(c). 

Hint: For each £ > 0, there exists a ^>0 such that Pr [|A(f/„) — 
A(c)| <£]>Pr[| U n -c\< S], Why? 

ADDITIONAL EXERCISES 

5.38. A nail manufacturer guarantees that not more than one nail in a box 
of 100 nails is defective. If, in fact, the probability of each individual nail 
being defective is p = 0.005, compute the probability that: 

(a) The next box of nails violates the guarantee. Use the Poisson 
approximation, after assuming independence. 

(b) The guarantee is violated at least once in the next 25 boxes. 

5.39. Let X„ and Y„ be the means of two independent random samples of 
size n from a distribution having variance a 2 . Determine n so that 
Pr (\X„ — y,,! < a/2) = 0.98, approximately. 

5.40. Let X t , X ly ..., X 2i be a random sample from a distribution with p.d.f. 
f(x) = 6x(1 — x), 0 < x < I, zero elsewhere. Find Pr [0.48 < X„ < 0.52] 
approximately. 

5.41. A rolls an unbiased die 100 independent times and B rolls an unbiased 
die 100 independent times. What is the'approximate probability that A will 
total at least 25 points more than B1 

5A1. Compute, approximately, the probability that the sum of the 
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observations of a random sample of size 24 from a chi-square distribution 
with 3 degrees of freedom is between 70 and 80. 

5.43. Let X be the number of times that n<) heads appear on two coins when 
these two coins are tossed together n times. Find the smallest value of n so 
that Pr (0.24 < X/n < 0.26) ^ 0.954, approximately. 

5.44. Two persons have 16 and 32 dollars, respectively. They bet one dollar 
on each of 900 independent tosses of an unbiased coin. What is an 
approximation to the probability that neither person is in debt at the end 
of the 900 trials? 

5.45. A die is rolled 720 independent times. Compute, approximately, the 
probability that the number of fives that appear will be between 110 and 
125 inclusive. 

5.46. A part is produced with a mean of 6.2 ounces and a standard deviation 
of 0.2 ounce. What is the probability that the weight of 100 such items is 
between 616 and 624 ounces? 

5.47. Let , X 25 be a random sample of size 25 from a distribution 

having p.d.f./(x) = x/6, x = 1, 2, 3, zero elsewhere. Approximate 

Pr ( f X, = 50, 51,..., or 60 
\/ -1 

5.48. Say that a lot of 1000 items contains 20 defective items. A sample of 
size 50 is takin at random and without replacement from the lot. If 3 or 
fewer defective items are found in this sample, the lot is accepted. 
Approximate the probability of accepting this lot. 

5.49. Let A",, ^ 2 , ..., A",, be a random sample from a distribution having finite 

■ n 

E(X m ), m > 0. Show that X?jn converges in probability to E{X m ). Was 

，• = I 

an additional assumption needed? 

5.50. It can be proved that the mean X„ of a random sample of size n from 
a Cauchy distribution has that same Cauchy distribution for every n. Thus 
X n does not converge in probability to zero. How can this be, as earlier, 
under certain conditions, we proved that X r conveig^s in probability to the 
mean of the distribution? 

5.51. Let Y be What is the limiting distribution of Z = J~Y — 

5.52. Let X be the mean of a random sample of size n from a Poisson 
distribution with parameter /i. Find the function Y = u(X) so that F has an 
approximate normal distribution with mean u(/i) and variance that is free 
of /i. 
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5.53. Let h < y 2 < … < y” be the order statistics of a random sample 
X lf X 2 ,..., X„ of size n from a distribution witli distribution function /l(x) 
and p.d.f. f(x) = F(x). Say f{^ p ) = p and /(^) > 0. Consider the order 
statistic Y [np]f where [np] is the greatest integer in np, 

(a) Note that the event y/n(Y [np] — ^ p ) <u is equivalent to Z> [np], where 
Z is the number of ^-values less than or equal to ^ + u/y/n. 

(b) Write Z > np, an approximation to Z > [np], as 

approximately, 

s/np{\ -p) y/p(l -p) 

using H^ P + »ls/n) « p +A^p)»lsfn. 

(c) Since the left-hand member of the inequality in part (b) is 
approximately 7V(0, 1), argue that Y ㈣ has an approximate normal 
distribution with mean ^ and variance p(l — p)/n\f(^ p )] 2 . 



CHAPTER 6 


Introduction 
to Statistical 
Inference 


6.1 Point Estimation 

The first five chapters of this book deal with certain concepts 
and problems of probability theory. Throughout we have carefully 
distinguished between a sample space 贫 of outcomes and the space 
of one or more random variables defined on 贫 . With this chapter we 
begin a study of some problems in statistics and here we are more 
interested in the number (or numbers) by which an outcome is 
represented than we are in the outcome itself. Accordingly, we shall 
adopt a frequently used convention. We shall refer to a random 
variable X as the outcome of a random experiment and we shall refer 
to the space of X as the sample space. Were it not so awkward, we 
would call I the numerical outcome. Once the experiment has been 
performed and it is found that = x, we shall call x the experimental 
value of X for that performance of the experiment. 
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This convenient terminology can be used to advantage in more 
general situations. To illustrate this, let a random experiment be 
repeated n independent times and under identical conditions. Then 
the random variables X u X 2y ..., X„ (each of which assigns a 
numerical value to an outcome) constitute (Section 4.1) the 
observations of a random sample. If we are more concerned with the 
numerical representations of the outcomes than with«the outcomes 
themselves, it seems natural to refer to X u , A"„as the outcomes. 

And what more appropriate name can we give to the space of a random 
sample than the sample space? Once the experiment has been 
performed the indicated number of times and it is found that X, = 

X 2 = x 2 ,..., = x„ y we shall refer to x t , x 2 ,..., x„ as the 

experimental values of X,, X 2i … ， 尤 or as the sample data. 

We shall use the terminology of the two preceding paragraphs, and 
in this section we shall give some examples of statistical inference. These 
examples will be built around the notion of a point estimate of an 
unknown parameter in a p.d.f. 

Let a random variable X have a p.d.f. that is of known functional 
form but in which the p.d.f. depends upon an unknown parameter 6 
that may have any value in a set Q. This will be denoted by writing the 
p.d.f. in the form f(x; 0), GeQ. The set Q will be called the parameter 
space. Thus we are confronted, not with one distribution of probability, 
but with a family of distributions. To each value of 6, 6 eQ, there 
corresponds one member of the family. A family of probability density 
functions will be denoted by the symbol {f(x; 6) :6 e Q}. Any member 
of this family of probability density functions will be denoted by the 
symbol f{x\ 6), 9 eQ. We shall continue to use the special symbols that 
have been adopted for the normal, the chi-square, and the binomial 
distributions. We may, for instance, have the family {N(9, 1): 0 e fi }， 
where Q is the set — oo < 0 < oo. One member of this family of 
distributions is the distribution that is A^(0, 1). Any arbitrary member 
is N(6, \) y — oo < 0 < oo. 

Let us consider a family of probability density functions 
{/(a:; 0) : It may be that the experimenter needs to select 

precisely one member of the family as being the p.d.f. of his random 
variable. That is, he needs a point estimate of 8. Let X { ,X 2 ,... ,X„ 
denote a random sample from a distribution that has a p.d.f. which is 
one member (but which member we do not know) of the family 
{f{x\ 0): 0 e of probability density functions. That is, our sample 



Sec. 6.1| Point Estimatitm 


261 


arises from a distribution that has the p.d.f. f{x\ 6):6 e Q. Our 
problem is that of defining a statistic Y } = X 2 , . .., X n ), so 

that if X,, x 2 ,. . ., are the observed experimental values of 
X u X 2 ,..., X„, then the number j?, = x 2 , … ， x„) will be a good 

point estimate of 6. 

The following illustration should help motivate one principle that 
is often used in finding point estimates. 

Example 1. Let X t , X 2 ,..., X„ denote a random sample from the 
distribution with p.d.f. 

x = o y i, 

= 0 elsewhere, 

where 0 < 0 < 1. The probability that X t = x, , X 2 = x 2 ,..., X„ = x„ is the 
joint p.d.f. 

俨 1(1 一 0)1 _ 0 )i - a .. • 的 (i _ 0)1 - a = 0 ^'(i — ey 1 -^', 

where x, equals zero or 1, / = 1,2,...,This probability, which is the joint 
p.d.f. of X t , , X„, may be regarded as a function of 6 and, when so 

regarded, is denoted by L(9) and called the likelihood function. That is, 

L(d) = 0 Zjc -(l -d) n ~ lx >, 0<d^\. 

We might ask what value of 6 would maximize the probability L{6) of 
obtaining this particular observed sample x,, x 2 ,..., x„. Certainly, this 
maximizing value of 6 would seemingly be a good estimate of 6 because it 
would provide the largest probability of this particular sample. Since the 
likelihood function UJS) and its logarithm, In L(d\ are maximized for the same 
value 9, either L(9) or In L(6) can be used. Here 

In 0 + (n — 石 x,) ln(l - 6); 

so we have 

din L(d) T x i 

—dd~ = ~e - T^- = 0 ， 

provided that 6 is not equal to zero or 1 , This is equivalent to the equation 

(i— 0 ) =0(” _ 

n n 

whose solution for 0 is ^ x t \n. That f x-,\n actually maximizes L(6) and 

I I 

In L(6) can be easily checked, even in the cases in which all of jc,, jc 2 , ..., jc„ 
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equal zero together or 1 together. That is. Y, x il n ' s value of d that 

,I 

maximizes L(d). The corresponding statistic. 



is called the maximum likelihood estimator of 6. The observed value of 沒， 

n 

namely E x-Jn, is called the maximum likelihood estimate of 9. For a simple 

I 

example, suppose that/i = 3, and or, = l , jr 2 = 0, x 3 = 1 , then L(0) = 9 7 (\ — 0) 
and the observed ^ = | is the maximum likelihood estimate of 9. 


The principle of the method of maximum likelihood can now be 
formulated easily. Consider a random sample X 2 ,..., X„ from a 
distribution having p.d.f. f(x\ 6), 0 e ft. The joint p.d.f. of 

X x , X 2 . X„ is /(jC |； 0)f(x 2 \ 6) - - 6). This joint p.d.f. may be 

regarded as a function of0. When so regarded, it is called the likelihood 
function L of the random sample, and we write 

L{d\ x x ,x 2 , =/(x,; 0)/(x 2 ; ff) - - 0), Be Cl. 

Suppose that we can find a rtontrivial function of x 2 , • •. ， x„, say 

m(jc, , jc 2 , ..., jc„), such that, when 0 is replaced by u(x y , x 2 .the 

likelihood function L is maximized. That is, L[u(x i ,x 2 ,.... x„); 
Jti, x 2 ,..., .v„] is at least as great as L(6; Xi ， x 2 ,, x„)for every 0eQ. 
Then the statistic u(X t , X 2 , •.. ， A"„) will be called a maximum likelihood 
estimator (hereafter abbreviated m.l.e.) of 6 and will be denoted by the 
symbol 沒 = u{X\, X 2 ,. ■ ■, X„). We remark that in many instances there 
will be a unique m.l.e. 0 ofa parameter 0, and often it may be obtained 
by the process of differentiation. 


Example 2. Let X x , X 2 ,..., X„ be a random sample from the normal 
distribution N(9, 1), — 00 < 0 < 00 . Here 


L{Q\ X\ , Xj, •. ■ ， Jf”）= 



This function L can be maximized by setting the first derivative of L, with 
respwt to 0, equal to zero and solving the resulting equation for 6. We note, 
however, that each of the functions L and In L is maximized at the same value 
of 0. So it may be easier to solve 


d\n L{6; Xi,x 2 , ...,x h ) 

^ = 
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For this example. 


d\n L(9; x u x 2 ,..., x n ) 

— de 


t( x t — o). 


If this derivative is equated to zero, the solution for the parameter 6 is 

n n 

x 2 , .. . ,x n ) = Y, x i/ n - That 5] x“n actually maximizes L is easily shown. 

I I ' 

Thus the statistic 


is the unique m.l.e. of the mean 0. 

It is interesting to note that in both Examples 1 and 2, it is true that 
E0) = 0. That is, in each of these cases, the expected value of the 
estimator is equal to the corresponding parameter, which leads to the 
following definition. 

Definition 1. Any statistic whose mathematical expectation is equal 
to a parameter 0 is called an unbiased estimator of the parameter 6. 
Otherwise, the statistic is said to be biased. 

Example 3. Let 

f(x; 0) = |, 0 < x < 0, 0 < 0 < oo, 

= 0 elsewhere, 

and let X U X 2 ,... ,X„ denote a random sample from this distribution. Note 
that we have taken 0 < .x < 0 instead of 0 < x < 0 so as to avoid a discussion 
of supremum versus maximum. Here 

L{d\ ... ,oc„) = ^；, 0 < jc, ^ 6, 

which is an ever-decreasing function of 0. The maximum of such functions 
cannot be found by dHTerentiation but by selecting 9 as small as possible. Now 
6 > each jc f ; in particular ， then, 0 > max (x,). Thus L can be made no larger 
than 

1 

[max (x,)]” 

and the unique m.l.e. ^ of 0 in this example is the nth order statistic max (JQ. 
It can be shown that £[max «•)] = nOj{n + 1). Thus, in this instance, the 
m.l.e. of the parameter 6 is biased. That is, the property of unbiasedness is not 
in general a property of a m.l.e. 


II 

I 

t, 

1 I M 

II 


f 

f 
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While the m.l.e. ^ of 0 in Example 3 is a biased estimator, results 
in Chapter 5 show that the /ith order statistic & = max (X；) = Y„ 
converges in probability to 0. Thus, in accordance with the following 
definition, we say that S = Y„ is a consistent estimator of 9. 

Definition 2. Any statistic that converges in probability to a 
parameter 6 is called a consistent estimator of that parameter 0 . 

Consistency is a desirable property of an estimator; and, in all cases 
of practical interest, maximum likelihood estimators are consistent. 

The preceding definitions and properties are easily generalized. Let 
X, Y,... ,Z denote random variables that may or may not 
be independent and that may or may not be identically distributed. Let 
the joint p.d.f. g(x, y,... > z\0 l ,0 2 ,... ,G m ) t (0,, 0 2 ,..., 6 m ) e Q, 
depend on m parameters. This joint p.d.f., when regarded as a 
function of ( 6 { , 0 2 ,..., 9 m )eCl, is called the likelihood function 
of the random variables. Then those functions u t (x,y ,..., z), 
u 2 (x, y,... y z), ... ， u m (x, y,... ,z) that maximize this likelihood 
function with respect to O u 0 2 ,, 0 m , respectively, define the 
maximum likelihood estimators 

^ “|(足 1^, . • • , Z), ^2 ~ “2(足 , Z)t •.. ， 

of the m parameters. 

Example 4, Let X 2 ,..., X„ denote a random sample from a 
distribution that is JV(fl|, 6 2 ), — oo < 0, < oo,0 < < oo. We shall find and 
^ 2 , the maximum likelihood estimators of 0, and 0 2 . The logarithm of the 
likelihood function may be written in the form 


In L{ 0 y , 62 ,x u 



\ 

^ 262 ^ 


n In (2n9 2 ) 


We observe that we may maximize by differentiation. We have 


5 In L 


— 0 ( ) 

i , 


^T = ¥ 2 


din L 

d9 2 


n 


1] (x, — ) 2 

i n 

29 \ 29 2 


If we equate these partial derivatives to zero and solve simultaneously the two 

n ^ 

equations thus obtained, the solutions for $ and 0 2 are found to be [ Xjjn = x 

i _ I 

and X (x, — x) 2 /rt = s 1 , respectively. It can be verified that these 
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solutions maximize L. Thus the maximum likelihood estimators of ^ fi 
and dz = a 2 are, respectively, the mean and the variance of the sample, namely 
X and = S 2 . Whereas is an unbiased estimator of the estimator 
= S 2 is biased because 


赋 )= 7 £ (今) = 7 £ (*) 


(n — l)<r 2 (n — 1)0 2 


However, in Chapter 5 it has been shown that = X and = S 2 converge 
in probability to 9, and0 2 , respectively, and thus rtiey are consistent estimators 
of 0 t and 9 2 . 

Suppose that we wish to estimate a function of 0, say h(6). For 
convenience, let us say that r\ = h(6) defines a one-to-one 
transformation. Then the value of rj, say tj, that maximizes the 
likelihood function L(0), or equivalently L[6 = /r _(”)]，is selected so 
that S = where 8 is the m.l.e. of 9. Thus rj is taken so that 

rj = h0)\ that is, 

h(0)^ h(§). 

I n '" r f 

This result is called the invariance property of a maximum likelihood 
estimator. For illustration，if f/ = 0 s , where 8 is the mean of N(6, 1), then 
r\ = X 1 . While there is a little complication if h{&) is not one-to-one, we 
still use the fact that ^ = h{§). Thus if X is the mean of the sample from 
6(1, 0), so that § = X and if rj = 0(1 — 0), then rj = X(l — X). Hiese 
ideas can be extended to more than one parameter. For illustration, in 
Example 4, if f/ = 0, + 2^/¥ 2 , then rj = X 25. 

Sometimes it is impossible to find maximum likelihood estimators 
in a convenient closed form and numerical methods must be used to 
maximize the likelihood function. For illustration, suppose that 
X x X„ is a random sample from a gamma distribution with 

parameters ot = 0| and ^ = d 2 , where 0, > 0, 0 2 > ()• It is difficult to 
maximize 

r i I " 

L(0\ ， 0 2 ; X, ， … . ， JC„)= 尸 ( 设）外 (x,jc 2 - - - X" 广 - 'exp 

with respect to 0| and 0 2 , owing to the presence of the gamma function 
r(0|). Thus numerical methods must be used to maximize L once 
jc, , jc 2i ..., jc„ are observed. 

There are other ways, however, to obtain easily point estimates of 
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Q\ and 0 2 . For illustration, in the gamma distribution situation, let us 
simply equate the first two moments of the distribution to the 
corresponding moments of the sample. This seems like a reasonable 
way in which to find estimators, since the empirical distribution 尸 "(x) 
converges in probability to F(x), and hence corresponding moments 
should be about equal. Here in this illustration we have 

0\0 2 = X, 0\0\ = S 2 , 

the solutions of which are 

沒 I = 餐 and ^2 = ^- 

We say that these latter two statistics, and are respective 
estimators of 0 1 and 0 2 found by the method of moments. 

To generalize the discussion of the preceding paragraph, let 
X 2 ,..., X„be a. random sample of size n from a distribution with 
p.d.f. f(x; Ot, 0 2 ,, 0 r ), (0,,..., 0 r ) e f2. The expectation E(X k ) is 
frequently called the kth moment of the distribution, k = 1， 2, 3,- 

n 

The sum M k = Y,^l n ls the 々th moment of the sample, 

I 

k = I, 2, 3,.... The method of moments can be described as follows. 
Equate E{X k ) to M k , beginning with k = l and continuing until there 
are enough equations to provide unique solutions for 0 ,, 0 2 ,..., 0 r , 
say hi(Af u M 2) i = l, 2,..., r, respectively. It should be noted 
that this could be done in an equivalent manner by equating /x = E(X) 

to X and E\{X — |i)*] to (Xj — X) k fn, k = 2,3, and so on until unique 

I 

solutions for 0|, 0 2 ,... ， 0, are obtained. This alternative procedure 
was used in the preceding illustration. In most practical cases, the 
estimator ^ = h^Afi, M 2 ,.. • )of0„ found by the method of moments, 
is a consistent estimator of 6 h i = 1 , 2 ,..., r. 

EXERCISES 

6.1. Let X 2 ,..., X„ represent a random sample from each of the 
distributions having the following probability density functions: 

(a) /(jc; 6) = 6 x e~ 9 /x\, jc = 0, 1, 2,..., 0 < 0 < oo, zero elsewhere, where 
/(0;0)=1. 

(b) f(x; 6) = 0 < x < 1,0 < 0 < oo, zero elsewhere. 

(c) f(x; 6) = (\/d)e~ xie , 0 <x < co,0 < 6 < oo, zero elsewhere. 
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(d) /(x; 9) = 5 e _|Jf_91 , —oo < x < oo, —oo < 0 < oo. 

(e) f(x ； 9)= : e -i* - 0 ^ x < co, —oo < 0 < oo, zero elsewhere. 

In each case find the m.l.e. § of 0. 

6.2. Let X t , X 2t ...,X„ be i.i.d.，each with the distribution having p.d.f. 
f(x; 0,, 0 2 ) = (\/6 2 )e~ (x ~ Bl)ie \ 6i<x<oo, — oo < 0, < oo, 0 < 0 2 < cc, 
zero elsewhere. Find the maximum likelihood estimators of 0| and 0 2 . 

6.3. Let Y\<Y 2 < •. * < be the order statistics of a random sample from 

a distribution with p.d.f. /(jc;0) = 1,0 — + —co<9<ao, 

zero elsewhere. Show that every statistic u(X t , X 2 ,..., X„) such that 

Y n -\<ui_x u x 2 ,...,x H )< y, + i 

is a m.l.e. of 6. In particular, (4y, + 2K” + 1)/6, (y, 4 - Y„)/2, and (2Y t + 
4Y„— 1)/6 are three such statistics. Thus uniqueness is not in general a 
property of a m.l.e. 

6.4. Let ^|, X 2 , and have the multinomial distribution in which n = 25, 
k = 4, and the unknown probabilities are 6 t , 0 2 , and respectively. 
Here we can, for convenience, let X 4 = 25 — X t — X 2 — Xy and 
0 4 = 1 — 0, — 0 2 — 0 3 . If the observed values of the random variables are 
jc, = 4, x 2 = 11, and x 3 = 7, find the maximum likelihood estimates of 0,, 
6 2 , and dj. 

6.5. The Pareto distribution is frequently used as a model in study of incomes 
and has the distribution function 

/(jc; d { , d 2 ) = \ — (0 t /x) $2 , < x, zero elsewhere, 

where 0, > 0 and 0 2 > 0. 

If X t , X 2 ,... ,X n is a random sample from this distribution, find the 
maximum likelihood estimators of and 0 2 . 

6.6. Let y” be a statistic such that lim E( Y„) = 9 and lim = 0. Prove that 

n-*co n-*-OD 

Y„ is a consistent estimator of 9. 

Hint: Pr (I - 0| ^ ^ ey]/t 2 and E\{Y n - df] =[E(Y„ - 6)f 

+ 4”. Why? 

6.7. For each of the distributions in Exercise 6.1, find an estimator of 0 by 
the method of moments and show that it is consistent. 

6.8. If a random sample of size n is taken from a distribution having p.d.f. 
f(x\ 6) = 2x/9 2 , 0 < x ^6, zero elsewhere, find: 

(a) The m.l.e. § for 9. 

(b) The constant c so that E{c§) = 6. 

(c) The m.l.e. for the median of the distribution. 
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6.9. Let X x , X 2 ,.. ., X n be i.i.d., each with a distribution with p.d.f. 
J\x\ 6) = (\!6)e~ x/e , 0 < jc < oo, zero elsewhere. Findthem.I.e. ofPr(A r ^ 2). 

6.10. Lei X have a binomial distribution with parameters n and p. The 
variance of Xjn is p(l — p)jn\ (his is sometimes estimated by the m.l.e. 

^^1 ~ ~ S jj n - * s this an unbiased estimator ofp(I - p)jrP. If not, can you 

construct one by multiplying this one by a constant? 

6.11. Let the (able 


X 

0 12 3 4 5 

Frequency 

6 10 14 13 6 1 


represent a summary of a sample of size 50 from a binomial distribution 
having n = 5. Find the m.l.e. of Pr (X > 3), 

6.12. Let y, < V 2 < • • < be the order statistics of a random sample of 

size n from the uniform distribution of the continuous type over the closed 
interval [6 — p,d + p\. Find the maximum likelihood estimators for 9 and 
p. Are these two unbiased estimators? 

6.13. Let X 2 , U 5 be a random sample from a Cauchy distribution 
with median £, that is, with p.d.f. 

⑽)=士〗 +( 夂，，<义<①， 

where — ao < 9 < oo. If x { = — 1.94, x 2 - 0.59, jt 3 = — 5.98, 
x 4 = ~ 0.08, x } = — 0.77, find by numerical methods the m.l.e. ofd. 

6.2 Confidence Intervals for Means 

Suppose that we are willing to accept as a fact that the (numerical) 
outcome of a random experiment is a random variable that has a 
normal distribution with known variance a 2 but unknown mean fi. 
That is, ^ is some constant, but its value is unknown. To elicit some 
information about /i, we decide to repeat the random experiment 
under identical conditions n independent times, n being a fixed 
positive integer. Let the random variables X\, X 2 ,.. X n denote， 
respectively, the outcomes to be obtained on these n repetitions of the 
experiment. If our assumptions are fulfilled, we then have under 
consideration a random sample X u X 2 ,..., X„ from a distribution 
that is a 2 ), ff 2 known. Consider the maximum likelihood estima- 
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tor of n, namely fi = X. 0( course, X is N(fi, <r 2 /«) and (X — n)l(<Tly/n) 
is iV(0, 1). Thus 


Pr 



X — fi 



0.954. 


However, the events 


and 


-2<支二#<2, 




~ 2 (r <X- M < 


A 






< fi< X + 


2a 


s/n v/m 

are equivalent. Thus these events have the same probability. That is, 


Prf JP 


2<t </x<Jp + ^l = 0.954. 




Jnj 


Since 一 (j is a known number, each of the random variables A" — 2<j/ v /n 
and X 4 - 2<r/y/n is a statistic. The interval (X — 2a\yfn, X 4 - 2<T/yJn) 
is a random interval. In this case, both end points of the interval are 
statistics. The immediately preceding probability statement can be 
read: Prior to the repeated independent performances of the random 
experiment, the probability is 0.954 that the random interval 
{X — 2(r/y/n, X 4 - 2<r/ v /n) includes the unknown fixed point (par¬ 
ameter) fi. 

Up to this point, only probability has been involved; 
the determination of the p.d.f. of X and the determination of 
the random interval were problems of probability. Now the 
problem becomes statistical. Suppose the experiment yields 
X y = x { , X 2 = x 2 ,..., X„ = x„. Then the sample value of X is 
3E = (A + ;c 2 + •.. 十 x„)/n, a known number. Moreover, since a 
is known, the interval (3c — 2<r/y/n, x 4 - 2a\J~k) has known 
endpoints. Obviously, we cannot say that 0.954 is the probability that 
the particular interval (3c — laljn, x + 2<t/ v /m) includes the 
parameter n ，for n ，although unknown, is some constant, and this 
( particular interval either does or does not include fi. However, the 
fact that we had such a high probability, prior to the performance of 
the experiment, that the random interval (X — 2a/y/n, X + 2a/y/n) 
includes the fixed point (parameter) 只 ， leads us to have some 
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reliance on the particular interval (x — 2<r/ v /«, x + 2(r/y/n). This 
reliance is reflected by calling the known interval (3c — la/y/n, 
x + 2a\yjn) a 95.4 percent confidence interval for fi. The number 0.954 
is called the confidence coefficient. The confidence coefficient is equal 
to the probability that the random interval includes the parameter. One 
may, of course, obtain an 80, a 90, or a 99 percent confidence interval 
for fi by using 1.282, 1.645, or 2.576, respectively, instead of the 
constant 2. 

A statistical inference of this sort is an example of interval 
estimation of a parameter. Note that the interval estimate of fi is found 
by taking a good (here maximum likelihood) estimate x of ^iand adding 
and subtracting twice the standard deviation of X, namely 
2a/y/n, which is small if n is large. If a were not known, the end points 
of the random interval would not be statistics. Although the prob¬ 
ability statement about the random interval remains valid, the sample 
data would not yield an interval with known end points. 

Example 1, If in the preceding discussion n = 40, a 2 = 10, and 3c = 7.164, 
then (7.164 - 1.282^,7.164+ 1.282^), or (6.523, 7.805), is an 80 percent 
confidence interval for Thus we have an interval estimate of /i. 

In the next example we show how the central limit theorem may 
be used to help us find an approximate confidence interval for fi when 
our sample arises from a distribution that is not normal. 

Example 2. Let X denote the mean of a random sample of size 25 from 
a distribution having variance a 2 — 100, and mean /i. Since ajyfn = 2, 
then approximately 

Pr ( -1.96 < < 1.96) = 0.95 ， 

or 

Pr (JP- 3.92 <n<X+ 3.92) = 0.95. 

Let the observed mean of the sample be!x = 67.53. Accordingly, the interval 
from jc — 3.92 = 63.61 to 3c + 3.92 = 71.45 is an approximate 95 percent 
confidence interval for the mean n ， 

Let us now turn to the problem of finding a confidence interval for 
the mean " of a normal distribution when we are not so fortunate as 
to know the variance d 1 . From Section 4.8, we know that 

yJnS 2 l[a\n - 1)] Sjjn - \ 

has a /-distribution with n — 1 degrees of freedom, whatever the value 
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of <r 2 > 0. For a given positive integer n and a probability of 0.95, say, 
we can find a number b from Table IV in Appendix B, such that 

Prf-^< M <^ = 0.95, 

V Sj^/n-X J 

which can be written in the form 

PrfjP —— </i< JP+ , bS ) = 0.95. 

V y/n-1 Jn-\) 

Then the interval [X — {bSj^Jn — 1), X + (bS/^/n — 1)] is a random 
interval having probability 0.95 of including the unknown fixed point 
(parameter) pi. If the experimental values of X^, X 2 , .. ., are 

rt n 

x u x 2 ,..., x„ with j 2 = ^ (jc, — x) 2 /n, where 3c = ^ Xj/n, then the 

I i 

interval [3c — {bs/^/n — 1), 3c 4 - {bs/y/n — 1)] is a 95 percent confidence 
interval for for every a 2 > 0. Again this interval estimate of ju is found 
by adding and subtracting a quantity, here bs\Jn — 1, to the point 
estimate 3c. 

Example 3. If in the preceding discussion « = 10, x = 3.22, and s = 1.17, 
then the interval [3.22 - (2.262)(1.17)/^9, 3.22 + (2.262)(1.17)/^] or 
(2.34, 4.10) is a 95 percent confidence interval for pi. 

Remark. If one wishes to find a confidence interval for n and if the 
variance ct 2 of the nonnormal distribution is unknown (unlike Example 2 of 
this section), he may with large samples proceed as follows. If certain weak 
conditions are satisfied, then S 2 , the variance of a random sample of size « > 2, 
converges in probability to a 2 . Then in 

_ s/ n ~ 1(A^ — n) 

JnS 2 /^ - \)a 2 S 

the numerator of the left-hand member has a limiting distribution that is 
^(0, 1) and the_d&nominator of that member converges in probability to 1. 
Thus y/n — \(X — fi)/S has a limiting distribution that is N(0, 1). This fact 
enables us to find approximate confidence intervals for 和 when our conditions 
are satisfied. This procedure works particularly well when the underlying 
nonnormal distribution is symmetric, because then X and S 2 are uncorrelated 
(the proof of which is beyond the level of the text). As the underlying 
distribution becomes more skewed, however, the sample size must be larger 
to achieve good approximations to the desired probabilities. A similar 
procedure can be followed in the next section when seeking confidence 
intervals for the difference of the means of two nonnormal distributions. 
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We shall now consider the problem of determining a confidence 
interval for the unknown parameter pofa binomial distribution when 
the parameter n is known. Let F be b(n ， p), where 0 <P < 1 and n is 
known. Then p is the mean of Y/n. We shall use a result of Example 
1， Section 5.5, to find an approximate 95.4 percent confidence interval 
for the mean p. There we found that 

Pr -2 < 厂 _ y ■ < 2 = 0.954, 

L s/<Y/n)(\ - Y/n) J 


approximately. Since 

Y — np ( Y/n) — p 

Jn{Yln){\ - Yin) ~ ^/(Y/n){\ - Y/n)/n 
the probability statement above can easily be written in the form 


Pr 


Y 




0.954, 


approximately. Thus, for large if the experimental value of Y is y, 
the interval 

y _ 2 轉 - i + 2 /( ， / 咖 1 - yi^~ 
n y n ’ n 十 n 

provides an approximate 95.4 percent confidence interval for p. 

A more complicated approximate 95.4 percent confidence interval 
can be obtained from the fact that Z = (Y ~ np)l s /np(\ — p)' has a 
limiting distribution that is N(0,1), and the fact that the event 
— 2 < Z < 2 is equivalent to the event 

Y+2-2J[Y(n- Y)/n] + 1 Y+2 + 2J[Y{n - Y)/n] + 1 

rt + 4 < P K n + 4 ‘• 

( 1 ) 

The first of these facts was established in Chapter 5, and the proof of 
inequalities (1) is left as an exercise. Thus an experimental value y of 
Y may be used in inequalities (1) to determine an approximate 95.4 
percent confidence interval for p. 

If one wishes a 95 percent confidence interval for p that does not 
depend upon limiting distribution theory, he or she may use the 
following approach. (This approach is quite general and can be used 
in other instances; see Exercise 6.21.) Determine two increasing 
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functions of/?, say q (p) and c 2 (p), such that for each value of p we have, 
at least approximately, 

Pr [C|(/>) < Y < c 2 (/?)] = 0.95. 

The reason that this may be approximate is due to the fact that y has 
a distribution of the discrete type and thus it is, in general, impossible 
to achieve the probability 0.95 exactly. With C|(p) and c 2 (p) increasing 
functions, they have single-valued inverses, say d x {y) and d 2 (y), 
respectively. Thus the events c^p) < Y < c 2 (p) and d 2 (Y) < p < d t (Y) 
are equivalent and we have, at least approximately, 

Pr [d 2 (Y)<p<d 1 (Y)] = 0.95. 

In the case of the binomial distribution, the functions c, (/>), c 2 (p), d 2 (y), 
and ^i(^) cannot be found explicitly, but a number of books provide 
tables of d 2 (y) and d { (y) for various values of n. 

Example 4. If, in the preceding discussion, we take n = lOOandy = 20, the 
first app roximate 95 .4 percent confidence interval is given by 
(0.2 - 2 % /(0.2)(0.8)/100, 0.2 + 2 v / (0.2)(0.8)/100) or (0.12,0.28). The ap¬ 
proximate 95.4 percent confidence interval provided by inequalities (1) is 

22 - 2^(1600/100) + 1 22 + 2^/(1600/100) + 1 
104 ， 104 

or (0.1 3, 0.29). By referring to the appropriate tables found elsewhere, we find 
that an approximate 95 percent confidence interval has the limits rf ? (20) = 0.13 
and rf,(20) = 0.29. Thus, in this example, we see that all three methods yield 
results that are in substantial agreement. 

Remark. The fact that the variance of Y/n is a function of p caused us 
some difficulty in finding a confidence interval for;?. Another way of handling 
the problem is to try to find a function u(Y/n) of Y/n, whose variance is 
essentially free of p. In Section 5.4, we proved that 



has an approximate normal distribution with mean arcsin y/p and variance 
1 jAn. Hence we could find an approximate 95.4 percent confidence interval by 
using 

arcsin sfWn 一 arcsin v^ <2 \ = 0954 
y/mit ) 

and solving the inequalities for p. 

Example 5. Suppose that we sample from a distribution with unknown 
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mean /i and variance a 2 = 225. We want to find the sample size n so that 3E 土 1 
(which means 3c — 1 to 3c + 1) serves as a 95 percent confidence interval 
for n ，Using the fact that the sample mean of the observations, X, is 
approximately N{pi, a 2 In), we see that the interval given by 3c ± 1.96(15/^/n) 
will serve as an approximate 95 percent confidence interval for fi. That is, we 
want 

'■ 96 © =， 

or, equivalently, 

yjn = 29.4 ， and thus n « 864.36 


or « = 865 because n must be an integer. Suppose, however, we could not 
afford to take 865 observations. In that case, the accuracy or confidence level 
could possibly be relaxed some. For illustration, rather than requiring 3c ± 1 
to be a 95 percent confidence interval for /i, possibly 5 土 2 would be a 
satisfactory 80 percent one. If this modification is acceptable, we now have 


1.282 




or, equivalently, 

y/n = 9.615 and n w 92.4. 

Since n must be an integer, we would probably use 93 in practice. Most likely, 

the persons involved in this project would find this is a more reasonable sample 

size. 

.•tS 

EXERCISES 

6.14. Let the observed value of the mean P of a random sample of size 20 from 

a distribution that is N(fi, 80) be 81.2. Find a 95 percent confidence interval 
for fi. * 

6.15. Let P be the mean of a random sample ofsizew from a distribution that 
is N(fi, 9). Find /»such that Pr (^ — 1 < n< X + \) = 0.90, approximately. 

6.16. Let a random sample of size 17 from the normal distribution N(pi, a 2 ) 
yield x = 4.7 and j 2 = 5.76. Determine a 90 percent confidence interval for 

6.17. Let A" denote the mean of a random sample of size n from a distribution 
that has mean fi and variance a 2 = 10. Find n so thatjhe probability is 
approximately 0.954 that the random interval (A" — A" + ?) includes fi. 
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6.18. Let A^i, X 2 ,X 9 be a. random sample of size 9 from a distribution that 
is N(n, a 2 ). 

(a) If a is known, find the length of a 95 percent confideiwe interval for /i 
if this interval is based on the random variable ^{X — fi)/a. 

(b) If a is unknown, find the expected value of the length of a 95 percent 
confidence interval for /i if this interval is based on the random variable 
7 ^ - ti)/s. 

Hint: Write E(S) = (a/^/n)E[(nS 2 /ff 2 y 12 ]. 

(c) Compare these two answers. 

6.19. Let X ]f X 2 ,..., X„, X„ +i bea. random sample of size n + 1,« > 1, from 

a distribution that is N(ji, a 2 ). Let X = Y^ A",/w andS 2 = — X) 2 jn. Find 

j I 

the constant c so that the statistic c(X — X H+y )/S has a /-distribution. If 
n = 8, determine k such that {X — kS < < X + kS) = 0.80. The 

observed interval (x — ks,x + ks) is often called an 80 percent prediction 
interval for X 9 . 

6.20. Let Y be ^(300,/?). If the observed value of y is 少 = 75， find an 
approximate 90 percent confidence interval for p. 

6.21. Let A" be the mean of a random sample of size n from a distribution that 
is N(ji, a 2 ), where the positive variance a 2 is known. Use the fact that 
❾ (2) — <I>(-^) = 0.954 to find, for each /i, 0 ,(//) and c 2 ()u) such that 
Pr [c,(^) < X < c 2 (/i)] = 0.954. Note that c,(^) and c 2 (ji) are increasing 
functions of fi. Solve fo£ the respective functions d t (x) and d 2 (x); thus we 
also have that Pr [d 2 (X) < n < d^{X)] = 0.954. Compare this with the 
answer obtained previously in the text. 

6.22. In the notation of the discussion of the confidence interval for p, show 
that the event — 2 < Z < 2 is equivalent to inequalities (1). 

Hint: First observe that — 2 < Z < 2 is equivalent to Z 2 < 4, which can 
be written as an inequality involving a quadratic expression in p. 

6.23. Let X denote the mean of a random sample of size 25 from a 
gamma-type distribution with a = 4 and P > 0. Use the central limit 
theorem to find an approximate 0.954 confidence interval for /i, the mean 
of the gamma distribution. 

Hint: Base the confidence interval on the random variable 
(X- 4p)/(4p 2 /25)' 12 = 5JP/2 芦 一 10. 

6.24. Let 3c be the observed mean of a random sample of size n from a 
distribution having mean /i and known variance a 2 . Find n so that x — a/4 
to 3c + (t/ 4 is an approximate 95 percent confidence interval for 

6.25. Assume a binomial model for a certain random variable. If we desire 
a % percent confidence interval for p that is at most 0.02 in length, find n. 
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Hint: Note that y/(y/n)(\ - y/n) <, v%(l - 士 )_ 

6.26. It is known that a random variable X has a Poisson distribution with 
parameter fi. A sample of200 observations from this population has a mean 
equal to 3.4. Construct an approximate 90 percent confidence interval 
for fi. 

6.27. Let < Y 2 < ... < Y„ denote the order statistics of a random sample 
of size n from a distribution that has p.d.f./(x) = 3?/0 3 , 0 < x < 6, zero 
elsewhere. 

(a) Show that Pr (c < YJ0 < 1) = 1 — c 3 ", where 0 < c < 1. 

(b) If n is 4 and if the observed value of Y 4 is 2.3, what is a 95 percent 
confidence interval for 01 

6.28. Let A",, , Xi be a random sample from N(ji, a 2 ), where both 

parameters fi and a 2 are unknown. A confidence interval for cr 2 can be found 
as follows. We know that nS 2 /<r 2 is z 2 (w — 1). Thus we can find constants 
a and b so that Pr (nS 2 /^ 2 < b) = 0.975 and Pr (a < nS 2 /^ < b) = 0.95. 

(a) Show that this second probability statement can be written as 
Pr (nS 2 /b < a 2 < nS 2 /a) = 0.95. 

(b) If « = 9 and s 1 = 7.63, find a 95 percent confidence interval for a 2 . 

(c) If fi is known, how would you modify the preceding procedure for 
finding a confidence interval for a 2 ? 

6.29. Let X t , X 7 , …， A",, be a random sample from a gamma distribution with 

known parameter a = 3 and unknown P > 0/Discuss the construction of 
a confidence interval for p. „ 

Hint: What is the distribution of 2 Follow the procedure 

outlined in Exercise 6.28. 卜 1 


6.3 Confidence Intervals for Differences of Means 

The random variable T may also be used to obtain a confidence 
interval for the difference ^ — fi 2 between the means of two normal 
distributions, say N(ji u a 1 ) and N(ji 2f a 2 ), when the distributions have 
the same，but unknown，variance a 2 . 

Remark. Let X have a normal distribution with unknown parameters 
and o 2 . A modification can be made in conducting the experiment so that the 
variance of the distribution will remain the same but the mean of the 
distribution will be changed; say, increased. After the modification has been 
effected, let the random variable be denoted by Y, and let Y have a normal 
distribution with unknown parameters fi 2 and a 2 . Naturally, it is hoped that 
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is greater than /i|, that is, that Hi — fi 2 < 0- Accordingly, one seeks a 
confidence interval for fi t — fi 2 * n order to make a statistical inference. 

A confidence interval for /z,' — fi 2 may be obtained as follows: Let 
X ]f X 2 ,... ,X n and Y x , Y 2 ,..., Y„ denote, respectively, independent 
random samples from the two distributions, ff 2 ) and N(jx 2 , o 2 ), 
respectively. Denote the means of the samples by X and Y and the 
variances of the samples by 5? and S\, respectively. It should be noted 
that these four statistics are independent. The independence of X and 
(and, inferentially that of Y and S\) was established in Section 4.8; 
the assumption that the two samples are independent accounts for the 
independence of the others. Thus X and Y are normally and 
independently distributed with means /x t and /x 2 and variances <r 2 /« and 
f/m ， —respectively. In accordance with Section 4.7，their difference 
A" — F is normally distributed with mean //, — and variance 
(T 2 /« 4- (^/m. Then the random variable 

y/(r 2 /n 4 - (^/m 

is normally distributed with zero mean and unit variance. This random 
variable may serve as the numerator of a J random variable. Further, 
nS^/o 2 and mSl/a 2 have independent chi-square distributions with 
« — 1 and m — 1 degrees of freedom, respectively, so that their sum 
(«5f + mS^/a 2 has a chi-square distribution with n m — 2 degrees 
of freedom, provided that m + « — 2 > 0. Because of the independence 
of X, Y, Si，and S|, it is seen that 

I nS] + mS\ 

V <^(n + m —2) 

may serve as the denominator of a T random variable. That is, the 
random variable 

_ (X — Y) — fi 2 ) 


has a /-distribution with n + m — 2 degrees of freedom. As in the 
previous section, we can (once n and m are specified positive integers 
with « + m — 2 > 0) f|nd a positive number b from Table IV of 
Appendix B such that 

Pr(-ft< T<b) = 0.95. 
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If we set 

n + m — 2\n mJ 

this probability may be written in the form 


Pr[(^- Y)-bR<n l -fi 2 <(X- F) + 6/?] - 0.95. 
It follows that the random interval 



has probability 0.95 of including the unknown fixed point (/i, — pt 2 ) - As 
usual, the experimental values of X, F, and S\, namely jc, y, j,, and 
si ，will provide a 95 percent confidence interval for /i| — ^ when the 
variances of the two normal distributions are unknown but equal. A 
consideration of the difficulty encountered when the unknown 
variances of the two normal distributions are not equal is assigned to 
one of the exercises. 

Example 1. It may be verified that if in the preceding discussion » — 10, 
m = l,x — 4.2, y = 3.4, j? — 49, — 32, then the interval (—5.16, 6.76) is a 

90 percent confidence interval for ^ — 只 2 . 

Let F, and F 2 be two independent random variables with binomial 
distributions b(n' ， p t ) and b{n 2 , p 2 ), respectively. Let us now turn to the 
problem of finding a confidence interval for the difference p x — p 2 of 
the means of F|/n, and F 2 /« 2 when n, and n 2 are known. Since the mean 
and the variance of Y'ln' — Y 2 /n 2 are, respectively, p\ — p 2 and 
P\{\ — pi)/n } 4- p 2 0 — P 2 )/” 2 ，-then the random variable given by the 
ratio 

(^i/ w i ~ Yi! n i) — iP\ — P 2 ) 
s/p\i} -P'Vn' +P 2 (\ -p2)/«2 

has mean zero and variance 1 for all positive integers n, and n 2 . More¬ 
over, since both Y t and Y 2 have approximate normal distribution 孕 
for large n 1 and n 2 , one sus^cts that the ratio has an approximate 
normal distribution. This is actually the case, but it will not be 
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proved here. Moreover, if n,/n 2 = c, where c is a fixed positive constant, 
the result of Exercise 6.36 shows that the random variable 

(3V”,)(1 - !>,)/”• + (>> 2 )(1 — Y 2 /n 2 )ln 2 

仍 （1 -Pi)/n { +/) 2 (1 - p 2 )/n 2 ^ 

converges in probability to 1 as n 2 ~*ao (and thus n,-+oo, since 
«, /« 2 = c,c> 0). In accordance with Theorem 6, Section 5.5, the 
random variable 

iX\! n \ — Yihi) — iP\ — Pi) 

W= U ， 


where 

U = - 7,/«,)//!, + (Y 2 /n 2 )(l - Y 2 /n 2 )ln 2 , 

has a limiting distribution that is iV(0, 1). The event —2 < W <2, the 
probability of which is approximately equal to 0.954, is equivalent to 
the event 


2i 


Y2 


-2U< Pl 


—P 2 < 




Accordingly, the experimental values y x and y 2 of Y x and Y 2 , 
respectively, will provide an approximate 95.4 percent confidence 
interval for p y — p 2 . 

Example 2. If, in the preceding discussion, we take rt t = 100, n 2 = 400, 
少 1 = 30, y 2 = 80, then the experimental values of YJrty — Y 2 /n 2 and U are 0.1 

and <^(0.3)(0.7)/100 + (0.2)(0.8)/400 = 0.05, respectively. Thus the interval 
(0, 0.2) is an approximate 95.4 percent confidence interval for p i — p 2 . 


EXERCISES 

6.30. Let two independent random samples, each of size 10, from two normal 
distributions a 2 ) and N(ji 2 , a 2 ) yield 3c = 4.8, ^ = 8.64, y = 5.6, 
^ = 7.88. Find a 95 percent confidence interval for ju, — ju 2 - 

6.31. Let two independent random variables Y x and Y 2 , with binomial 
distributions that have parameters n, = n 2 = 100, p t , and p 2 , respectively, 
be observed to be equal to y t = 50 and y 2 = 40. Determine an approximate 
90 percent confident interval for p { — p 2 . 

6.32. Discuss the problem of finding a confidence interval for the difference 
fi t — 卩 2 between the two means of two normal distributions if the variances 
a] and <rf are known but not necessarily equal. 
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6.33. Discuss Exercise 6.32 when it is assumed that the variances are unknown 
and unequal. This is a very difficult problem, and the discussion 
should point out exactly where the difficulty lies. If, however, the variances 
are unknown but their ratio a]la\ is a known constant k, then a statistic that 
is a r random variable can again be used. Why? 

6.34. As an illustration of Exercise 6.33, one can let X 2 ,..., X 9 and 
y,, Y 2 ,..., Y y2 represent two independent random samples from the 
respective normal distributions N(ji x , a]) and N(fi 2 , a\). It is given that 
a\ = l>a\, but a\ is unknown. Define a random variable which has a 
/-distribution that can be used to find a 95 percent interval for — 

6.35. Let X and Y be the means of two independent random samples, each 

of size n, from the respective distributions a 2 ) and N(ji 2 , a 2 ), where the 

common variance is known. Find n such that 

Pr(X-Y-a/5<n i -fi 2 <X-Y+ «r/5) = 0.90 

6.36. Under the conditions given, show that the random variable defined by 
ratio (1) of the text converges in probability to 1. 

6.37. Let X t , X 2 ,..., X„ and y,, y 2 ,..., y m be two independent random 
samples from the respective normal distributions N(n t , o^) and N(ji 2 , a\\ 
where the four parameters are unknown. To construct a confidence interval 
for the ratio, a]la\, of the variances, form the quotient of the two 
independent chi-square variables, each divided by its degrees of freedom, 
namely 

lid '、 

F = w ^， 

where 5? and S\ are the respective sample variances. 

(a) What kind of distribution does F have? 

(b) From the appropriate table, a and b can be found so that 
Pr (F<b) = 0.975 and Pr (a < F < b) = 0.95. 

(c) Rewrite the second probability statement as 

„ T nS]!{n - 1) L nS]/(n -1)1 Aflc 

mS\/(m — 1) a\ — 1 )」 

The observed values, jf and can be inserted in these inequalities to 
provide a 95 percent confidence interval for a\l(r 2 . 

6.4 Tests of Statistical Hypotheses 

The two principal areas of statistical inference are the areas of 
estimation of parameters and of tests of statistical hypotheses. The 
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problem of estimation of parameters, both point and interval esti¬ 
mation, has been treated. In Sections 6.4 and 6.5 some aspects 
of statistical hypotheses and tests of statistical hypotheses will 
be considered. The subject will be introduced by way of example. 

Example 1. Let it be known that the outcome A" of a random experiment 
is N(6, 100). For instance, X may denote a score on a test, which score 
we assume to be normally distributed with mean 9 and variance 100. Let 
us say the past experience with this random experiment indicates that 
0 = 75. Suppose, owing possibly to some research in the area pertaining to 
this experiment, some changes are made in the method of performing 
this random experiment. It is then suspected that no longer does 0 ; 75 
but that now 6 > 75. There is as yet no formal experimental evidence 
that 6 > 75; hence the statement 0 > 75 is a conjecture or a statistical 
hypothesis. In admitting that the statistical hypothesis 0 > 15 may be false, 
we allow, in effect, the possibility that 6 < 75. Thus there are actually two 
statistical hypotheses. First, that the unknown parameter 6 < 75; that is, 
there has been no increase in 6. Second, that the unknown parameter 
6 > 75. Accordingly, the parameter space isn={0:—oo<0<oo}. We 
denote the first of these hypotheses by the symbols H 0 :G <75 and the 
second by the symbols H { :6 > 75. Since the values 0 > 75 are alternatives 
to those where 6 < 75, the hypothesis H t :6 > 75 is called the alternative 
hypothesis. Needless to say, H 0 could be called the alternative to H\\ 
however, the conjecture, here 6 > 75, that is made by the research worker 
is usually taken to be the alternative hypothesis. In any case the problem 
is to decide which of these hypotheses is to be accepted. To reach a decision, 
the random experiment is to be repeated a number of independent times, 
say n, and the results observed. That is, we consider a random sample 
X t , X 2 ,..., X„ from a distribution that is N(6, 100), and we devise a rule 
that will tell us what decision to make once the experimental values, 
say X,, x 2 ,..., x„, have been determined. Such a rule is called a test of 
the hypothesis H 0 :6 <15 against the alternative hypothesis H\ .6 >15. 
There is no bound on the number of rules or tests that can be con¬ 
structed. We shall consider three such tests. Our tests will be constructed 
around the following notion. We shall partition the sample space si into a 
subset C and its complement C*. If the experimental values oiX\,X 2 ,..., X„, 
say X,, x 2 ,..., x„, are such that the point (Xi, x 2 ,. -., .x n ) e C, we shall reject 
the hypothesis H 0 (accept the hypothesis H\). If we have (x,, x 2 ,..., x„) e C*, 
we shall accept the hypothesis H 0 (reject the hypothesis //,). 

Test 1. Let n = 25. The sample space s4 is the set 

{(X|, x 2 , …， a ： 2j) : —oo < x,- < oo, i = 1,2 . 25}. 
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Let the subset C of the sample space be 

C = {(Jf,, x 25 ) : x, + x 2 + … + x 25 > (25)(75)}. 

We shall reject the hypothesis H 0 if and only if our 25 experimental values are 
such that (jci, x 2 ,.. ., jc 25 ) e C. If (x,, x 2 , …， x 25 ) is not an element of C, we 
shall accept the hypothesis H 0 . This subset C of the sample space that leads 
to the rejection of the hypothesis H 0 :d <15 is called the critical region of Test 

25 _ _ 25 

1 • Now ^ Xi > (25)(75) if and only if 3c > 75, where ^ [ xJ25. Thus we can 

I i 

much more conveniently say that we shall reject the hypothesis/f 0 : 6 < 75 and 
accept the hypothesis /f,: 0 > 75 if and only if the experimentally determined 
value of the sample mean x is greater than 75. If x 幺 75, we accept the 
hypothesis H a :0 <： 75. Our test then amounts to this: We shall reject the 
hypothesis H 0 :6 ^ 75 if the mean of the sample exceeds the maximum value 
of the mean of the distribution when the hypothesis H Q is true. 

It would help us to evaluate a test of a statistical hypothesis if we knew 
the probability of rejecting that hypothesis (and hence of accepting the 
alternative hypothesis). In our Test 1, this means that we want to compute the 
probability 

Pr [(JT, ， … ， X 2S ) e C] = Pr (f > 75). 

Obviously, this probability is a function_of the parameter 0 and we shall denote 
it by K { (0). The function K { (6) = Pr (A" > 75) is called the power function of 
Test 1, and the value of the power function at a parameter point is called the 
power of Test 1 at that point. Because X is N(6, 4), we have 



So, for illustration, we have, by Table III of Appendix B, that the power at 
0 = 75 is A ： ,(75) = 0.500. Other powers are K t (J3) = 0.159, ^,(77) = 0.841, 
and AT,(79) = 0.977. The graph of K { {&) of Test 1 is depicted in Figure 6 .1. 
Among other things, this means that, if 0 = 75, the probability of rejecting 
the hypothesis H 0 \ 6 < 75 is 5 . That is, if 6 = 15 so that H 0 is true, the 
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FIGURE 6.1 
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probability of rejecting this true hypothesis is Many statisticians and 
research workers find it very undesirable to have such a high probability as 
5 assigned to this kind of mistake: namely the rejection of H 0 when if 0 isa true 
hypothesis. Thus Test 1 does not appear to be a very satisfactory test. Let us 
try to devise another test that does not have this objectionable feature. We 
shall do this by making it more difficult to reject the hypothesis H Q ，with the 
hope that this will give a smaller probability of rejecting H 0 when that 
hypothesis is true. 

Test 1 Let n = 25. We shall reject the hypothesis H o :0 <15 and accept 
the hypothesis H t : 0 > 15 if and only if 3c > 78. Here the critical region is 
C = {(jcj, ..., jc 25 )^jc, + ■ • ■ + jc 25 > (25)(78)}. The power function of 
Test 2 is, because X is N(0 ， 4 )， 

K 2 (0) = Pt(X> 78) = 1 -<t> 

Some values of the power function of Test 2 are K 2 (73) = 0.006, 
K 2 (75) = 0.067, 尺 2 (77) = 0.309, and K 2 (79) = 0.691. That is, if 0 = 75, the 
probability of rejecting H o :0 ^75 is 0.067; this is much more desirable than 
the corresponding probability \ that resulted from Test 1. However, if H Q is 
false and, in fact, 0 = 77, the probability of rejecting H 0 :6 <75 (and hence 
of accepting H } :0 > 75) is only 0.309. In certain instances, this low 
probability 0.309 of a correct decision (the acceptance of H x when /T, is true) 
is objectionable. That is, Test 2 is not wholly satisfactory. Perhaps we can 
overcome the undesirable features of Tests 1 and 2 if we proceed as in Test 3. 

Test 3. Let us first select a powerfunction K!(0) that has the features of 
a small value at 0 — 75 and a large value at 0 = 77. For instance, take 
尺 3(75) = 0.159 and K 3 (77) = 0.841. To determine a test with such a power 
function, let us reject /T 0 : 0 S 75 if and only if the experimental value 3c of the 
mean of a random sample of size n is greater than some constant .c. Thus the 
critical region is C = {(jci, jc 2 , ..., jc„) : jc, + jc 2 + + • • • + > nc). It 

should be noted that the sample size n and the constant c have not been 
determined as yet. However, since X is N(6, 100jn), the power function is 

K^d) = Pr (JP > c) = 1 - 

KlO/y/nJ 

The conditions Aj(75) = 0.159 and K 3 (77) = 0.841 require that 

1 -d>( c ~ 7 ^ ) = 0.159, 1—® 

\10/^/nJ 

Equivalently, from Table HI of Appendix B, we have 

c -* 75 t c — 77 t 

\0/y/n 10/y/n 
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The solution to these two equations in n and c is « = 100, c = 76. With these 
values of n and c, other powers of Test 3 are 尺 j(73) = 0.001 and 
AT 3 (79) = 0.999. It is important to observe that although Test 3 has a more 
desirable power function than those of Tests 1 and 2;'a certain “price” has 
been paid — a sample size of « = 100 is required in Test 3, whereas we had 
n = 25 in the earlier tests. 


Remark. Throughout the text we frequently say that we accept the 
hypothesis H 0 if we do not reject H 0 in favor of H\ . If this decision is made, 
it certainly does not mean that H 0 is true or that we even believe that it is true. 
All it means is, based upon the data at hand, that we are not convinced that 
the hypothesis // 0 is wrong. Accordingly, the statement “We accept H 0 '' would 
possibly be better read as “We do not reject ff 0 .” However, because it is in 
fairly common use, we use the statement “We accept H 0 ,'' but read it with this 
remark in mind. 


We have now illustrated the following concepts; 

1. A statistical hypothesis. 

2. A test of a hypothesis agaifist an alternative hypothesis and the 
associated concept of the critical region of the test. 

3. The power of a test. 

These concepts will now be formally defined. 


Definition 3. A statistical hypothesis is an assertion about the 

distribution of one or more random variables. If the statistical 

* 

hypothesis completely specifies the distribution, it is called a simple 
statistical hypothesis', if it does not, it is called a composite statistical 
hypothesis. 

If we refer to Example 1, we see that both H o .0 <75 and 
Hi：6 >75 are composite statistical hypotheses, since neither of them 
completely specifies the distribution. If there, instead of H 0 : 6 < 75, we 
had H 0 :9 = 75, then H 0 would have been a simple statistical 
hypothesis. 

Definition 4. A lest of a statistical hypothesis is a rule which, when 
the experimental sample values have been obtained, leads to a decision 
to accept or to reject the hypothesis under consideration. 

Definition 5. Let C be that subset of the sample space which, in 
accordance with a prescribed test, leads to the rejection of the 
hypothesis under consideration. Then C is called the critical region of 
the test. 



Sec. 6.4] Tests of Statistical Hypotheses 


285 


Definition 6. The power function of a test of a statistical hypothesis 
H 0 against an alternative hypothesis //, is that function, defined for 
all distributions under consideration, which yields the probability that 
the sample point falls in the critical region C of the test, that is, a 
function that yields the probability of rejecting the hypothesis under 
consideration. The value of the power function at a parameter point 
is called the power of the test at that point. 

Definition 7. Let H 0 denote a hypothesis that is to be tested against 
an alternative hypothesis H x in accordance with a prescribed test. The 
significance level of the test (or the size of the critical region C) is the 
maximum value (actually supremum) of the power function of the test 
when Hq is true. 

If we refer again to Example 1, we see that the significance levels 
of Tests 1, 2, and 3 of that example are 0.500, 0.067，and 0.159, 
respectively. An additional example may help clarify these definitions. 

Example 1. It is known that the random variable X has a p.d.f. of the form 

f(x; 0) = ^ e~ x '\ 0 < x < oo, 

= 0 elsewhere. 


It is desired to test the simple hypothesis H 0 :6 = 2 against the alternative 
simple hypothesis H\：6 — A. Thus Q : = {e.e = • 2, 4}. A random sample 
X,, X 2 of size n = 2 will be used. The test to be used is defined by taking the 
critical region to be C = {(x,, x 2 ) : 9.5 < x, + x 2 < oo}. The power function 
of the test and the significance level of the test will be determined. 

There are but two probability density functions under consideration, 
namely, f{x\ 2) specified by H 0 and /(x; 4) specified by H y . Thus the power 
function is defined at but two points 6 = 2 and 0 = 4. The power function of 
the test is given by Pr [(X\, X 2 ) e C\. If H 0 is true, that is, 0 = 2, the joint p.d.f. 
of X\ and X 2 is 

/(x,; 2 )/(jc 2 ; 2) = + x ”’ 2 ， 0 < x, < oo, 0 < x 2 < oo, 

= 0 elsewhere, 

and 


Pr [(X t ,^) G q=1 -Pr [(U 2 ) e C*] 

/ * 9 . 5 /% 9.5 - X2 

= 1 - \e^ x ^ x ^ l dx,d Xl 


“0 


= 0.05, approximately. 
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If is true, that is, 6 = 4, the joint p.d.f. of X t and X 2 is 

; 4)_/(x 2 ; 4) = + 0 < x, < 00, 0 < x 2 < CO, 

= 0 elsewhere, 

and 

. / *9,5 — A^2 

Pr [(JiT, ,X 2 )eC\=\- ^e~ ix ' +x ^dx^dx 2 

4 ) Jo 

« # * 

= 0.31, approximately. 

Thus the power of the test is given by 0.05 for 0 = 2 and by 0.31 for 0 = 4. 
That is, the probability of rejecting H 0 when H 0 is true is 0.05, and the 
probability of rejecting H 0 when H 0 is false is 0.31. Since the significance level 
of this test (or the size of the critical region) is the power of the test when H 0 
is true, the significance level of this test is 0.05. 

The fact that the power of this test, when 6 = 4, is only 0.31 immediately 
suggests that a search be made for another test which, with the same power 
when 6 = 2, would have a power greater than 0.31 when 0 = 4. However later, 
it will be clear that such a search would be fruitless. That is, there is no test 
with a significance level of 0.05 and based on a random sample of size n = 2 
that has greater power at 0 = 4. The only manner in which the situation may 
be improved is to have recourse to a random sample of size n greater than 2. 

Our computations of the powers of this test at the two points 0 = 2 and 
6 = 4 were purposely done the hard way to focus attention on fundamental 
concepts. A procedure that is computationally simpler is the following. When 
the hypothesis H 0 is true, the random variable X is x 2 (2). Thus the random 
variable + X 2 = Y, say, is x 2 (4). Accordingly, the power of the test when 
Ho is true is given by 

Pr(rs ： 9.5) = 1 -Pr(K<9.5)= 1 - 0.95 = 0.05, 

from Table II of Appendix B. When the hypothesis is true, the random 
variable X/2 is x 2 (2); so the random variable (X t + X z )j2 = Z, say, is 
Accordingly, the power of the test when is true is given by 

Pr (I, +X 2 > 9.5) = Pr (Z > 4.75) 

/*QO 

= \ze~ ia dz, 

人 .75 

which is equal to 0.31, approximately. 

Remark. The rejection of the hypothesis H 0 when that hypothesis is true 
is, of course, an incorrect decision or an error. This incorrect decision is often 
called a type I error; accordingly, the significance level of the test is the 
probability of committing an error of type I. The acceptance of H 0 when H 0 
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is false (//| is true) is called an error of type II. Thus the probability of a 
type II error is 1 minus the power of the test when H x is true. Frequently, it 
is disconcerting to the student to discover that there are so many names for 
the same thing. However, since all of them are used in the statistical literature, 
we feel obligated to point out that “significance level,” “size of the critical 
region,” “power of the test when H 0 is true,” and “the probability of 
committing an error of type I” are all equivalent. 

EXERCISES [ 

6.38. Let X have a p.d.f. of the form J\x; 6) = 6^~\0 < x < 1, zero 
elsewhere, where Oe {8:9 = 1, 2}. To test the simple hypothesis // 0 : 0 = 1 
against the alternative simple hypothesis H x :9 ^=2, use a random 
sample X 2 of size n = 2 and define the critical region to be 
C = {(jc,, x 2 ) :| < X, x 2 }. Find the power function of the test. 

6.39. Let X have a binomial distribution with parameters n = 10 and 
pe {p .p = The simple hypothesis H 0 :p = ^ is rejected, and the 
alternative simple hypothesis : p = | is accepted, if the observed value of 
JT,, a random sample of size 1, is less than or equal to 3. Find tile power 
function of the test. 

6.40. Let A",, Jf 2 bea random sample of size n = 2 from the distribution having 
p.d.f._/(:c; 0) = {\jG)e~ x,e , 0 < x < oo, zero elsewhere. We reject H a \B = 2 
and accept : 0 = 1 if the observed values of X x , X 2 , say x,, x 2 , are such 
that 

/(jc,; 2)f(x 2 ; 2) ^ 1 

Here Q = {0 : 0 = 1, 2}. Find the significance level of the test and the power 
of the test when H 0 is false, 

6.41. Sketch, as in Figure 6.1, the graphs of the power functions of Tests 1, 
2, and 3 of Example 1 of this section. 

6.42. Let us assume that the life of a tire in miles, say X, is normally distributed 
with mean 6 and standard deviation 5000. Past experience indicates that 
6 = 30,000. The manufacturer claims that the tires made by a new process 
have mean d > 30,000, and it is very possible that 0 = 35,000. Let us check 
his claim by testing H 0 : 6 = 30,000 against H x \d> 30,000. We shall 
observe n independent values of A", say ... , x„, and we shall reject H 0 
(thus accept H t ) if and only if x > c. Determine n and c so that the power 
function K(0) of the test has ^ the values .尺 (30,000) = 0.01 and 
^(35,000) = 0.98. 

6.43. Let X have a Poisson distribution with mean 0. Consider the simple 
hypothesis H 0 \9 = { and the alternative composite hypothesis 
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Thus — {0 : 0 < 0 < j}. Let , X u denote a random sample of size 

12 from this distribution. We reject H 0 if and only if the observed value of 
Y = X t + • • ■ X n < 2. If K(9) is the power function of the test, find the 
powers 尺 0 ， 尺①，尺① , 尺 ①， and K(^). Sketch the graph ofK(d). What is 
the significance level of the test? 

6.44. Let Y have a binomial distribution with parameters n and p. We reject 

Hq：p = [ and accept Y>c. Find n and c to give a power 

function K(p) which is such that A^(|) = 0.10 and 尺 (!）= 0.95, 
approximately. 

6.45. Let Y\ < Y 2 < F 3 < F 4 be the order statistics of a random sample of size 
« = 4 from a. distribution with p.d.f. f(x; 0) = 1/0, 0 < x < 6, zero 
elsewhere, where 0 < 0. The hypothesis // 0 : 0 = 1 is rejected and //,: 0 > l 
accepted if the observed Y A > c. 

(a) Find the constant c so that the significance level is a = 0.05. 

(b) Determine the power function of the test. 

6.5 Additional Comments About Statistical Tests 

All of the alternative hypotheses considered in Section 6.4 were 
one-sided hypotheses. For illustration, in Exercise 6.42 we tested 
H 0 : 6 = 30,000 against the one-sided alternative H y \ 6 > 30,000, 
where 6 is the mean of a normal distribution having standard deviation 
<7 = 5000. The test associated with this situation, namely reject H 0 if 
and only if the sample mean > c, is a one-sided test. For convenience, 
we often call H 0 : 6 = 30,000 the null hypothesis because, as in this 
exercise, it suggests that the new process has not changed the mean of 
the distribution. That is, the new process has been used without 
consequence if in fact the mean still equals 30,000; hence the 
terminology null hypothesis is appropriate. So in Exercise 6.42 we are 
testing a simple null hypothesis against a composite one-sided 
alternative with a one-sided test. 

This does suggest that there could be two-sided alternative 
hypotheses. For illustration, in Exercise 6.42, suppose there is the 
possibility that the new process might decrease the mean. That is, say 
that we simply do not know whether with the new process 8 > 30,000 
or 6 < 30,000; or there has been no change and the null hypothesis 
H 0 : 6 = 30,000 is still true. Then we would want to test H 0 : 6 = 30,000 
against the two-sided alternative //,: 0 # 30,000. To help see how to 
construct a two-sided test for H 0 against //,, consider the following 
argument. 
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In dealing with a test of H 0 j. 0 = 30,000 against the one-sided 
alternative 6 > 30,000, we used X > c or, equivalently. 



where since X is N{6 = 30,000, under H 0 , Z is ^V(0, 1); and we 
could_select c, = 1 *645 to have a test of significance level a = 0.05. That 
is, if A" is 1 .645o/y/n greater than the mean 0 = 30,000, we would reject 
H 0 and accept H { and the significance level would be equal toa = 0.05. 
To test H 0 : 0 = 30,000 against H } : 6 ^ 30,000, let us again use X 
through Z and reject H 0 if X or Z is too large or too small. Namely, 
if we reject H 0 and accept //, when 



the significance level a = 0.05 because this is the probability of 
\Z\ > 1.96 when H 0 is true. 

It is interesting to note that the latter test is the equivalent of 
saying that we reject H 0 and accept //, if 30,000 is not in the (two- 
sided) confidence interval for the mean 0. Or equivalently, if 


X - 1.9630,000 < JP+ 1.96 
V" 

then we accept H 0 \6 = 30,000 because those two inequalities are 
equivalent to 


7 : 


X- 30,000 
a\yfn 


< 1.96, 


which leads to the acceptance of H 0 : 6 = 30,000. 

Once we recognize this relationship between confidence intervals 
and tests of hypotheses, we can use all those statistics that we used to 
construct confidence intervals to test hypotheses, not only against 
two-sided alternatives but one-sided ones as well. Without listing all 
of these in a table, we give enough of them so that the principle can 
be understood. 

Example L Let 无 and S 2 be the mean and the variance of a random sample 
of size n coming from a 2 ). To test, at significance level a - 0.05, 
H 0 _• pi = 叫 against the two-sided alternative H, : fi ¥= Ho, reject if 
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where b is the 97.5th percentile of the /-distribution with n — 1 degrees of 
freedom. 

Example 2. Let independent random samples be taken from a 2 ) and 
N(^i, <r 2 ), respectively. Say these have the respective sample characteristics n, 
X, 5^ and m, Y, Si- At a = 0.05, reject f/ 0 : /i, = |i 2 and accept the one-sided 
alternative H, : fi t > n 2 if 



Note that X ~ Y has a normal distribution with mean zero under H 0 . So c 
is taken as the 95th percentile of a 卜 distribution with n + m — 2 degrees of 
freedom to provide a = 0.05. 

Example 3. Say Yis b{n, p). To test H 0 :p = p 0 against H t : p < p 0 , we use 
either 

< c or Z 2 

Ifn is large, both Z) and Z 2 have approximate standard normal distributions 
provided that H 0 :p = p 0 is true. Hence c is taken to be —1.645 to give an 
approximate significance level of a = 0.05. Some statisticians use Z, and 
others Z 2 . We do not have strong preference one way or the other because 
the two methods provide about the same numerical result. As one might 
suspect, using Z, provides better probabilities for power calculations if the 
true p is close to p 0 while Z 2 is better if H 0 is clearly false. However, with a 
two-sided alternative hypothesis, Z 2 does provide a better relationship with 
the confidence interval for p. That is, |Z 2 | < 2 is equivalent to p 0 being in the 
interval from 

Y n l(Y/n)(\ - Y/n) ^ y ^ 

- 2 /- to —十 2 

n yj n n 

which is the interval that provides a 95.4 percent confidence interval for p as 
considered in Section 6.2. 

In dosing this section, we introduce the concepts of randomized 
tests and p-values through an example and remarks that follow the 
example. 

Example 4. Let A",, ..., be a random sample of size « = 10 from 

a Poisson distribution with mean 9. A critical region for testing // 0 : 0 = 0.1 

10 

against //,: 0 > 0.1 is given by K = [ AT,. > 3. The statistic Y has a Poisson 



( Yjn) - po 
； (W(1 - Y/n)jn 


( Y/n) - p 0 
/po(l -Po)/« 
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distribution with mean 10^. Thus，with 9 — 0.1 so that the mean ofTis 1， the 
significance level of the test is 

Pr(r>3) = 1 - Pr(X<2) = 1 - 0.920 = 0.080. 

10 

If the critical region defined by ^ x, ^ 4 is used, the significance level is 

I 

a = Pr(r>4)= 1 -Pr(r< 3) = 1 - 0.981 =0.019. 

If a significance level of about a = 0.05, say, is desired, most statisticians 

would use one of these tests; that is, they would adjust the significance level 

to that of one of these convenient tests. However, a significance level of 

10 10 

a = 0.05 can be achieved exactly by rejecting if x, > 4 or if x,- = 3 and 

l I 

an auxiliary independent random experiment resulted in “succ 娜 ，” where the 
probability of success is selected to be equal to 

0.050-0.019 31 

0.080 - 0.019 = 6l' 

This is due to the fact that, when 9 = 0.1 so that the mean of Y is 1, 

Pr (F ^ 4) + Pr (K = 3 and success) = 0.019 + Pr (: K = 3) Pr (success) 

= 0.019+ (0.061)|| = 0.05. 

The process of performing the auxiliary experiment to decide whether to reject 
or not when K = 3 is sometimes referred to as a randomized test. 

Remarks. Not many statisticians like randomized tests in practice, 
because the use of them means that two statisticians could make the same 
assumptions, observe the same data, apply the same test, and yet make 
different decisions. Hence they usually adjust their significance level so as not 
to randomize. As a matter of fact, many statisticians report what are 
commonly called p-values (for probability values). For illustration, if in 
Example 4 the observed K is 广 = 4, the value is 0.019; and if it is = 3, the 
/j-value is 0.080. That is, the p-value is the observed “tail” probability of a 
statistic being at least as extreme as the particular observed value when H 0 is 
true. Hence，more generally, if K = u(X t , X 2 ,..., JC.)is the statistic to be used 
in a test of H 0 and if the critical region is of the form 

m(x,, JC 2 , . . . , < c, 

an observttl value m(X|, x 2 , •. • ， d would mean that the 

/j-value = Pr ( K < rf; H 0 ). 

That is, if G(>») is the distribution function of u(X t , X 2 ,..., X„), provided 

that H 0 is true, the p-value is equal to G(d) in this case. However, 
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G( Y), in the continuous case, is uniformly distributed on the unit interval, so 
an observed value G{d) < 0.05 would be equivalent to selecting c, so .that 

Pr [u(X t , X n ) ^ c; H 0 ] = 0.05 

and observing that d < c. Most computer programs automatically print out 
the /»-value of a test. 

Example 5. Let X y , X 2 ,..., X 2S be a random sample from N(ji, a 1 = 4). 
To test H 0 : n = ll against the one-sided alternative hypothesis H t : 
H < 77, say we observe the 25 values and determine that x =^76.1. The 
variance of X is a 2 /« = 4/25 = 0.16; so we know that Z = (X — 77)/0.4 
is A^(0, 1) provided that /i = 77. Since the observed value of this test statistic 
is z = (76.1 — 77)/0_4 = —2.25, the p-value of the test is 0(—2.25)= 

1 — 0.988 = 0.012. Accordingly, if we were using a significance level of 
a = 0.05, we would reject H 0 and accept : /i < 77 because 0.012 < 0.05. 

EXERCISES 

6.46. Assume that the weight of cereal in a “10-ounce box” is N(fi, a 2 ). To 
test //„: ^ = 10.1 against •• n > 10.1, we take a random sample of size 
« = 16 and observe that x = 10.4 and s = 0.4. 

(a) Do we accept or reject H 0 at the 5 percent significance level? 

(b) What is the approximate ^-value of this test? 

6.47. Each of 51 golfers hit three golf balls of brand X and three golf balls 

of brand Y in a random order. Let X ； and y, equal the averages of the 
distances traveled by the brand X and brand Y golf bails hit by the /th golfer, 
/ = 1, 2,.. ., 51. Let Wj = X f — Y h i = 1,2,. .., 51. Test H 0 : pL w = 0 
against H t : 0, where /i^is the mean of the differences. Ifi? = 2.07 and 

s 2 w = 84.63, would H 0 be accepted or rejected at an a = 0.05 significance 
level? What is the />-value of this test? 

6.48. Among the data collected for the World Health Organization air quality 
monitoring project is a measure of suspended particles in Let X and 
Y equal the concentration of suspended particles in ng/m 3 in the city center 
(commercial district) for Melbourne and Houston, respectively. Using 
/i = 13 observations of X and m = 16 observations of Y, we shall test 

n Y against H t : n x < n Y . 

(a) Define the test statistic and critical region, assuming that the variances 
are equal. Let a = 0.05. 

(b) If x = 72.9, s x = 25.6, y = 81.7, and s y = 28.3, calculate the value of the 
test statistic and state your conclusion. 

6.49. Let p equal the proportion of drivers who use a seat belt in a state that 
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does not have a mandatory seat belt law. It was claimed that p = 0.14. An 
advertising campaign was conducted to increase this proportion. Two 
months after the campaign, y = 104 out of a random sample of n = 590 
drivers were wearing their seat belts. Was the campaign successful? 

(a) Define the null and alternative hypotheses. 

(b) Define a critical region with an a = 0.01 significance level. 

(c) Determine the approximate /?-value and state your conclusion. 

6.50. A machine shop that manufactures toggle levers has both a day 
and a night shift. A toggle lever is defective if a standard nut cannot be 
screwed onto the threads. Let/?, and p 2 be the proportion of defective levers 
among those manufactured by the day and night shifts, respectively. We 
shall test the null hypothesis, H 0 :p { = p 2 , against a two-sided alternative 
hypothesis based on two random samples, each of 1000 levers taken from 
the production of the respective shifts. 

(a) Define the test statistic which has an approximate N(0, l) distribution. 
Sketch a standard normal p.d.f. illustrating the critical region having 
a = 0.05. 

(b) If = 37 and y 2 = 53 defectives were observed for the day and night 
shifts, respectively, calculate the value of the test statistic and the 
approximate /7-value (note that this is a two-sided test). Locate the 
calculated test statistic on your figure in part (a) and state your 
conclusion. 

6.51. In Exercise 6.28 we found a confidence interval for the variance a 1 using 
the variance S 1 of a random sample of size n arising from N(ji, ct 2 )，where 
the mean ^ is unknown. In testing H 0 : a 2 = al against H { : a 2 > a], use the 
critical region defined by nS 2 /^ > c. That is, reject Hq and accept H\ if 
S 2 cal/n. If n = 13 and the significance level a = 0.025, determine c. 

6.52. In Exercise 6.37, in finding a confidence interval for the ratio of 

the variances of two normal distributions, we used a statistic 
[nS 2 J(n — i)]/[mSl/(m — 1)], which has an F-distribution when those two 
variances are equal. If we denote that statistic by F, we can test // 0 : = 02 

against H x \d\>c\ using the critical region F c. If n = 13, m = 11, and 
a = 0.05, find c. 

6.6 Chi-Square Tests 

In this section we introduce tests of statistical hypotheses called 
chi-square tests. A test of this sort was originally proposed by Karl 
Pearson in 1900, and it provided one of the earlier methods of statistical 
inference. 
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Let the random variable be N(ji„ of), / = 1,2,..., n, and let 
X 2 ,... ,X„ be mutually independent. Thus the joint p.d.f. of these 
variables is 


1 

<Ti(T 2 … (T„(2n) nl2 CXP 



— 00 < X,- < 00 . 


The random variable that is defined by the exponent (apart from 

n ' 

the coefficient —|) is X iXi ~~ 从 ) 2 /°f ， and this random variable is x\ n )- 

~ I 

In Section 4.10 we generalized this joint normal distribution 
of probability tio n random variables that are dependent and we call the 
distribution a multivariate normal distribution. In Section 10.8, it will 
be shown that a certain exponent in the joint p.d.f. (apart from a 
coefficient of — 士 ) ..defines a random variable that is x\ n )- This fact is 
the mathematical basis of the chi-square tests. 

Let us now discuss some random variables that have approximate 
chi-square distributions. Let X 、 be b(n, p t ). Since the random variable 
Y=(X, — np x )j^Jnp x {\ — has, as n~*oo, a limiting distribution 
that is A^O, 1)，we would strongly suspect that the limiting distribution 
of Z = Y 2 is x 2 (l)_ This is, in fact, the case, as will now be shown. If 
G„(y) represents the distribution function of Y, we know that 

lim G„(y)= 少 ( 少)， — oo < ^ < oo, 

# 1-^00 

where 少 ( 少 ） is the distribution function of a distribution that is N(0, 1). 
Let H„{z) represent, for each positive integer n, the distribution 
function of Z = Y 2 . Thus, if z > 0, 

H n (z) = Pr(Z<z) = 

= - G^i-y/z)-]. 

Accordingly, since ® ( 少 ) is everywhere continuous, 

lim H n {z) = 紙办 -^(-V5) 

n-*oo 

= 2 「 ― e~ w2/2 dw. 

Jo V 2jr 
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If we change the variable of integration in this last integral by writing 
w 2 = v, then 


lim H„(z )= 


r(|)2 ,/2 


yl /2 - 1 ^- y / 2 


provided that z ^ 0. If z < 0, then lim H„(z) = 0. Thus lim H„(z) is 

equal to the distribution function of a random variable that is x 2 ⑴. 
This is the desired result. 

Let us now return to the random variable X t which is bin,p x ). Let 
X 2 = n — X y and let p 2 = I — p\. 1 (v/q denote Y 1 by Q t instead of Z, 
we see that Q, may be written as 

q d — n P\f iX\ — n P\) 2 ^ d — n Pi) 2 
~ np,(l -p x ) ~ np' + »(1 -/?,) 

_ (^1 ~ n P\Y + (^2 ~ n P2) 2 
— ~ nP\ np 2 


because (A", — np t ) 2 ~ (n — X 2 — n + np 2 ) 2 - (X 2 — np 2 f. Since has 
a limiting chi-square distribution with 1 degree of freedom, we say, 
when n is a positive integer, that Q x has an approximate chi-square 
distribution with 1 degree of freedom. This result can be generalized 
as follows. 

Let X u X 2 ,..., X k _ { have a multinomial distribution with the 
parameters rt,p x , . as in Section 3.1. As a convenience, let 

= n — (^i + . ■. + A - 1 ) and let - 凡 =1 — (_pi + ■. ■ + A - 1 ). 
Define Q k _ x by 


Qk- I 


，- =i n Pi 


It is proved in a more advanced course that, as oo, , has a 
limiting distribution that is x 2 (k — 1). If we accept this fact, we can say 
that Q k -\ has an approximate chi-square distribution with k — 1 
degrees of freedom when w is a positive integer. Some writers caution 
the user of this approximation to be certain that n is large enough that 
each np h i = 1 ， 2, ... ，灸 ， is at least equal to 5. In any case it is important 
to realize that Q k _, does not have a chi-square distribution, only an 
approximate chi-square distribution. 

The random variable Q k _ i may serve as the basis of the tests of 
certain statistical hypotheses which we now discuss. Let the sample 



296 


Introduction to Statistical Inference [Ch. 6 


space W of a random experiment be the union of a finite number 
kof mutually disjoint sets A ,, A 2 ,..., A k . Furthermore, let P{A) = p h 
i = I, 2,.. . , /c, where p k = ' — p 、一 … 一 p k _ 、， so that p, is the 
probability that the outcome of the random experiment is an element 
of the set The random experiment is to be repeated n independent 
times and will represent the number of times the outcome is an 
element of the set A,. That is, X,, X 2 ,X k = n — X, — ~ , 

are the frequencies with which the outcome is, respectively, an element 
of A,, A 2 ,. . ., A k . Then the joint p.d.f. of X,, X 2y .. ., X k _, is the 
multinomial p.d.f. with the parameters ,p t _Consider the 

simple hypothesis (concerning this multinomial p.d.f.) H 0 : p { = p yQy 

Pi = P 20 , • •' Pk- \ = Pk-\s> iPk = Pko = 1 ~P\o 一 . •’一 A - 1 , 0 )，where 
P 10 ,..., A - 1,0 are specified numbers. It is desired to test H 0 against all 
alternatives. 

If the hypothesis // 0 is true, the random variable 


( 2 *- 



d — np i0 ) 2 

np i0 


has an approximate chi-square distribution with k — 1 degrees of 
freedom. Since, when H 0 is true, np i0 is the expected value of X h one 
would feel intuitively that experimental values of Q k . , should not be 
too large if H 0 is true. With this in mind, we may use Table II of 
Appendix B, with /c — 1 degrees of freedom, and find c so that 
Pr (Q k _ I > c) = a, where a is the desired significance level of the test. 
If, then, the hypothesis H Q is rejected when the observed value of , 
is at least as great as c, the test of H 0 will have a significance level that 
is approximately equal to ot. 

Some illustrative examples follow. 

Example 1. One of the first six positive integers is to be chosen by a 
random experiment (perhaps by the cast of a die). Let A t = {x: x = j'}, 
/■ = l, 2, .. . , 6. The hypothesis H a : P(Ai) - p, Q = g, i = l, 2,..., 6, will be 
tested, at the approximate 5 percent significance level, against all alternatives. 
To make the test, the random experiment will be repeated, under the same 
conditions, 60 independent times. In this example k = 6and/i/?, 0 = 60( 去） =10, 
/' = I, 2,. .., 6. Let X, denote the frequency with which the random 

experiment terminates with the outcome in A 卜 i = 1,2 . 6, and let 

6 

Qs = Y, ~ 10) 2 /I0. If H 0 is true. Table II, with /c — 1 = 6 — l = 5 degrees 

I 

of freedom, shows that we have Pr (Q 5 > 11.1) = 0.05. Now suppose that 
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the experimental frequencies of A { , A 2 ,..., A 6 are, respectively, 13, 19, 11, 
8, 5, and 4. The observed value of Q 5 is 

(13- 10) 2 (19- 10) 2 (11 - 10) 2 (8—10) 2 

^io^ ^io^ ^io^ io~ 


(5 — 10) 2 (4 - 10) 2 

+ ^io~ ^io~ 


15.6 


Since 15.6 > 11.1, the hypothesis P(A)) = !，/■= 1 ， 2, .. • ， 6 , is rejected at the 
(approximate) 5 percent significance level. 


Example 2. A point is to be selected from the unit interval { : 0 < x < 1 } 
by a random process. Let < = {x : 0 < x s 士}， = {.x : ^ < x <. 3 }, A 3 = 
{x : 5 < X ^ and A 4 = {x ： j< x < 1}. Let the probabilities/?,, i = 1, 2, 3, 4, 
assigned to these sets under the hypothesis be determined by the p.d.f. 2 x, 
0 < x < 1, zero elsewhere. Then these probabilities are, respectively, 

疒 1/4 

P\o = 2x dx = 長 , Pro = 長， Pw = ~ ii- 

^0 


Thus the hypothesis to be tested is that /?,, p 2 , p iy and p 4 = \ — p 2 — p 3 

have the preceding values in a multinomial distribution with k - 4. This 
hypothesis is to be tested at an approximate 0.025 significance level by 
repeating the random experiment « = 80 independent times under the same 
conditions. Here the np m , i = 1, 2, 3,4, are, respectively, 5, 15, 25, and 35. 
Suppose the observed frequencies of A 2 , A } , and A 4 are 6 , 18, 20, and 

4 

36, respectively. Then the observed value of Q 3 = ^] (X, — np i9 ) 2 j{np m ) is 

1 

(6 - 5 ) 2 • (18 - 15) 2 , (20 - 25) 2 (36 - 35 ) 2 64 , 

+ — 15~~ + 25 + 35 = 35 = 1 - 83 ' 

approximately. From Table II, with 4—1=3 degrees of freedom, the value 
corresponding to a 0.025 significance level is c = 9.35. Since the observed 
value of Q 3 is less than 9.35, the hypothesis is accepted at the (approximate) 
0.025 level of significance. 

Thus far we have used the chi-square test when the hypothesis H 0 . 
is a simple hypothesis. More often we encounter hypotheses H 0 in 
which the multinomial probabilities p' ， p 2 , …， are not completely 
specified by the hypothesis H 0 . That is, under H 0 , these probabilities 
are functions of unknown parameters. For illustration, suppose that 
a certain random variable Y can take on any real value. Let us partition 
the space {>»: — oo < >» < oo} into k mutually disjoint sets 
A u A 2 ,..., ^4* so that the events A h A 2 ,... y A k are mutually exclu- 
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sive and exhaustive. Let H 0 be the hypothesis that Y is tr 2 ) with 
fi and a 2 unspecified. Then each 


'is a function of the unknown parameters /j. and a 1 . Suppose that we take 
a random sample K,,..., Y„ of size n from this distribution. If we let 
Xi denote the frequency of A,-, i = 1,2,..., k, so that 
X, + ■ ■ ■ + X k = n, the random variable 


a 


exp [-(y- fif/2(r 2 ] dy. 


1 , 2 ,..., 


Q k - 


^ (I, - np ,) 2 

=i ^ nPi ~ 


cannot be computed once X\,... ,X k have been observed, since each 
p h and hence 2 * _ ,, is a function of the unknown parameters ft and a 2 . 

There is a way out of our trouble, however. We have noted that 
0* _ I is a function of fi and a 2 . Accordingly, choose the values of n and 
a 2 that minimize Obviously, these values depend upon the 

observed , X k = x k and are called minimum chi-square 

estimates of /i and a 2 . These point estimates of pi and a 2 enable us to 
compute numerically the estimates of each p t . Accordingly, if these 

values are used, Q k _ , can be computed once Y 2 . Y n , and hence 

X i9 X 2 , … ， X k , are observed. However, a very important aspect of the 
fact, which we accept without proof, is that now g* _, is approximately 
X 2 (k — 3). That is, the number of degrees of freedom of the limiting 
chi-square distribution of Q* _, is reduced by one for each parameter 
estimated by the experimental data. This statement applies not only to 
theiproblem at hand but also to more general situations. Two examples 
will now be given. The first of these examples will deal with the test of 
the hypothesis that two multinominal distributions are the same. 

Remark. In many instances, such as that involving the mean n and the 
variance a 2 of a normal distribution, minimum chi-square estimates are 
difficult to compute. Hence other estimates, such as the maximum likelihood 
estimates fi = Y and a 2 = S 1 , are used to evaluate p, and Q k -\. In general, 
Q k ~\ is not minimized by maximum likelihood estimates, and thus its 
computed value is somewhat greater than it would be if minimum chi-square 
estimates were used. Hence, when comparing it to a critical value listed in the 
chi-square table with k — 3 degrees of freedom, there is a greater chance of 
rejecting than there would be if the actual minimum of Q k _ i is used. 
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Accordingly, the approximate significance level of such a test will be some¬ 
what higher than that value found in the table. This modification should be 
kept in mind and, if at all possible, each /?, should be estimated using the 
frequencies X u ... ,X k rather than using directly the observations 
F|, Y 2 ,..., Y„ of the random sample. 

Example 3. Let us consider two multinomial distributions with pa¬ 
rameters nj,p {j ,p 2 j ,..., p kj , j = 1,2, respectively. Let X tj , i = 1,2,...,^, 
j = 1, 2, represent the corresponding frequencies. If n t and n 2 are large and the 
observations from one distribution are independent of those from the other, 
the random variable 

全全 (x 厂 n jPij y 

; - I /-1 n jpij 

is the sum of two independent random variables, each of which we treat as 
though it were x 2 (^ — 1 )； that is, the random variable is approximately 
X 2 (2/c ^ 2). Consider the hypothesis 

= Pl2, P2l = P22, • . . ， Al = Pk2> 

where each p a = p a , i — 1,2,..., k, is unspecified. Thus we need point esti¬ 
mates of these parameters. The maximum likelihood estimator of p n = p n , 
based upon the frequencies X tj , is (X,, + X i2 )j(n\ + n 2 ), i = 1,2,... y k. Note 
that we need only k — 1 point estimates, because we have a point estimate of 
p kX = p k2 once we have point estimates of the first k — 1 probabilities. In 
accordance with the fact that has been stated, the random variable 

y y, {^ij — + ^2)/(” I + "2)]} 2 

M / = 1 ^ nj[{X iX + X i2 )/(n { + n 2 )]^ 

has an approximate x 2 distribution with 2k — 2 — (k — 1) = — 1 degrees of 

freedom. Thus we are able to test the hypothesis that two multinomial 
distributions are the same; this hypothesis is rejected when the computed value 
of this random variable is at least as great as an appropriate number from 
Table II， with k — 1 degrees of freedom. 

The second example deals with the subject of contingency tables. 

Example 4. Let the result of a random experiment be classified by two 
attributes (such as the color of the hair and the color of the eyes). That is, one 
attribute of the outcome is one and only one of certain mutually exclusive and 
exhaustive events, say J, ， A 2 , … ， A a ; and the other attribute of the outcome 
is also one and only one of certain mutually exclusive and exhaustive events, 
say B x , B 2 ,, B b . Let p tj = P(Aj n Bj), i = 1,2 ,..., a; j = 1, 2,..., ft. 
The random experiment is to be repeated n independent times 
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and Xjj will denote.the frequency of the event n B r Since there are fe 
such events as A ( n B n the random variable 


ab 


Qab 




nPu ) 2 


»Po 


has an approximate chi-square distribution with ab — 1 degrees of freedom, 
provided that n is large. Suppose (hat we wish to test the independence of 
the A attribute and the B attribute; that is, we wish to test the hypothesis 
H 0 : P(Ai r\ Bj) = P(A,)P(BjX i = 1,2,... ,a;j - 1, 2,..., 6 . Let us denote 
P(A,) by p L and P(Bj) by p^, thus 

b a 

Pi. = Z Pip P.j = Z Pir 

j=\ /* I 

and 

i = z 

y = I /« I I i = I 

Then the hypothesis can be formulated as H 0 : pij = Z 7 /. A > 1 ~ K 2 ,..., a; 
7 = 1, 2,.. ., 6. To test H 0 , we can use Q ab _ t with p j} replaced by Pi.p.j. 
But if Pi , i = 1,2,... ,a, and p /5 j = 1 ， 2,… ， 6 ， are unknown, as they 
frequently are in applications, we cannot compute , once the frequencies 
are observed. In such a case we estimate these unknown parameters by 

Pi. = ~zr > where ^ Xjj, i = 1 ， 2, ... ， a, 

n j -1 

and 

P. j — ~~~» where 7 ^ Xjj, j = \, 2,..., b. 

7 ^ I 


Since ^ p,. =Y^P.i = we have estimated only a—l+b—\=a + b — 2 

t j 

parameters. So if these estimates are used in Q oh _ ,, with p-,j = p L p.j, then, 
according to the rule that has been stated in this section, the random variable 

H — «»(o) 

has an approximate chi-square distribution with ab — l — (a + b — 2)= 
{a — 1)(6 — 1) degrees of freedom provided that H 0 is true. The hypothesis H Q 
is then rejected if the computed value of this statistic exceeds the constant c, 
where c is selected from Table II so that the test has the desired significance 
level a. 

In each of the four examples of this section, we have indicated that 
the statistic used to test the hypothesis H 0 has an approximate 
chi-square distribution, provided that n is sufficiently large and H 0 is 
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true. To compute the power of any of these tests for values of the 
parameters not described by H 0 , we need the distribution of the statistic 
when H 0 is not true. In each of these cases, the statistic has an 
approximate distribution called a noncentral chi-square distri¬ 
bution. The noncentral chi-square distribution will be discussed in 
Section 10.3. 


EXERCISES 

6.53. A number is to be selected from the interval {x: 0 < x < 2} by a 
random process. Let A t = {x : (i — 1)/2 < jc < i/2}, i = 1, 2, 3, and let 
^4 = {jc : § < x < 2}. A certain hypothesis assigns probabilities p i0 to these 
sets in accordance with p i0 = \ A . (|)(2 — jc) dx, i = 1, 2, 3, 4. This hypothesis 
(concerning the multinomial p.d.f. with fc = 4) is to be tested, at the 5 
percent level of significance, by a chi-square test. If the observed frequencies 
of the sets A h / = 1, 2, 3, 4, are, respectively, 30, 30, 10, 10, would H 0 be 
accepted at the (approximate) 5 percent level of significance? 


6.54. Let the following sets be defined: A t = {a: : — oo < jc < 0}, 
= {x : / — 2 < x < / — 1}, / = 2,.. ., 7, and / < 8 = {jc : 6 < a: < oo}. A 
certain hypothesis assigns probabilities p i0 to these sets A t in accordance 
with 


Pm = 


f 1 _ 

'(x-3) 2 ' 

( — ex p 
^A f 2 ;r 

L 2(4) J 


dx. 


i = 1, 2,.. ., 7, 8 . 


This hypothesis (concerning the multinomial p.d.f. with ^ = 8 ) is to be 
tested, at the 5 percent level of significance, by a chi-square test. If the 
observed frequencies of the sets A,, / = 1, 2,..., 8 , are, respectively, 60,96, 
140, 210, 172, 160, 88 , and 74, would H 0 be accepted at the (approximate) 
5 percent level of significance? 

6.55. A die was cast n = 120 independent times and the following data 
resulted: 


Spots up 

1 

2 3 4 5 6 

Frequency 

b 

20 20 20 20 40-b 


If we use a chi-square test, for what values of b would the hypothesis that 
the die is unbiased be rejected at the 0.025 significance level? 

6.56. Consider the problem from genetics of crossing two types of peas. 
The Mendelian theory states that the probabilities of the classifications 
(a) round and yellow, (b) wrinkled and yellow, (c) round and green, and 
(d) wrinkled and green are 备，長，吾 ， and respectively. If, from 160 
independent observations, the observed frequencies of these respective 




Test, at the 0.05 significance level, the hypothesis of independence of the 
A attribute and the B attribute, namely H 0 : P(Ai r\ Bj) = P{A,)P{Bj\ 
i = 1, 2, 3 and j = 1,2, 3, 4, against the alternative of dependence. 

6.59. A certain genetic model suggests that the probabilities of a particular 
trinomial distribution are, respectively, p, = p 2 , p 2 = 2p(\ — p), and 
/?3 = (1 — p) 1 , where 0 <p <\. If X 2 , represent the respective 
frequencies in n independent trials, explain how we could check on the 
adequacy of the genetic model. 

6.60. Let the result of a random experiment be classified as one of the mutually 
exclusive and exhaustive ways Ai, A 2 , A } and also as one of the 
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classifications are 86, 35, 26, and 13, are these data consistent with the 
Mendelian theory? That is, test, with a = 0.01, the hypothesis that the 
respective probabilities are m and 

6.57. Two different teaching procedures were used on two different groups 
of students. Each group contained 100 students of about the same ability. 
At the end of the term, an evaluating team assigned a letter grade to each 
student. The results were tabulated as follows. 





Grade 




Group 

A 

B 

C 

D 

F 

Total 

I 

15 

25 

32 

17 

11 

100 

II 

9 

18 

29 

28 

16 

100 


If we consider these data to be independent observations from two 
respective multinomial distributions with k = 5, test, at the 5 percent 
significance level, the hypothesis that the two distributions are the same 
(and hence the two teaching procedures are equally effective). 

6 ^8. Let the result of a random experiment be classified as one of the mutually 
exclusive and exhaustive ways A { , A 2 , and also as one of the mutually 
exclusive and exhaustive wqys B x , B 2 , fi 3 , B A , Two hundred independent 
trials of the experiment result in the following data: 


10 

21 

15 

6 

11 

27 

21 

13 

6 

19 

27 

24 


54 


3 

fl 


52 


B 


I 2 3 

d^d 
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mutually and exhaustive ways B 2 , B 3 , B 4 . Say that 180 independent 
trials of the experiment result in the following frequencies: 




b 2 

b 3 



15-3A: 

15 —k 

15 +A: 

15 + 3A: 


15 

15 

15 

15 

^3 

15 + 3A: 

15 +A: 

15-k 

15-3A: 


where k is one of the integers 0, 1 ， 2, 3, 4, 5. What is the smallest value of 
k that will lead to the rejection of the independence of the A attribute and 
the B attribute at the a = 0.05 significance level? 

6.61. It is proposed to fit the Poisson distribution to the following data 


JC 

0 

1 

2 

3 

3 < x 

Frequency 

20 

40 

16 

18 

6 


(a) Compute the corresponding chi-square goodness^)f-fit statistic. 

Hint: In computing the mean, treat 3 < x as x = 4. 

(b) How many degrees of freedom are associated with this chi-square? 

(c) Do these data result in the rejection of the Poisson model at the a = 0.05 
significance level? 

ADDITIONAL EXERCISES 

6.62. Let h < F 2 < •- • < be the order statistics of a random sample of 
size n from the distribution having p.d.f. f(x) = 2xj6 2 , 0 < x <6 y zero 
elsewhere. 

(a) If 0 < c < 1, show that Pr (c < Y„J6 < 1) = 1 — c 2 ". 

(b) If w = 5 and if the observed value of Y„ is 1.8, find a 99 percent 
confidence interval for 6. 

6.63. If 0.35, 0.92, 0.56, and 0.71 are the four observed values of a random 

sample from a distribution having p.d.f. /(x; 0 ) = 6^~ x , 0 < < 1 , zero 

elsewhere, find an estimate for 6. 

6.64. Let the table 


JC 

0 1 2 3 4 5 

Frequency 

6 10 14 13 6 1 
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represent a summary of a random sample of size 50 from a Poisson 
distribution. Find the maximum likelihood estimate of Pr {X = 2). 

6.65. Let X be N(n ， 100). To test H Q : n = SO against //,://> 80, let the 
critical region be defined by C = {(x,, x 2 ,. . ., x 25 ) : x > 83}, where x is the 
sample mean of a random sample of size n = 25 from this distribution. 

(a) How is the power function K(fi) defined for this test? 

(b) What is the significance level of this test? 

(c) What are the values of 尺 (80), 尺 (83), and 尺 (86)? 

(d) Sketch the graph of the power function. 

(e) What is the 尸 -value corresponding to x = 83.41? 

6 .66. Let X equal the yield of alfalfa in tons per acre per year. Assume that 
X is A^(1.5, 0.09). It is hoped that new fertilizer will increase the average 
yield. We shall test the null hypothesis H Q : ^ = 1.5 against the alternative 
hypothesis //,:#> 1.5. Assume that the variance continues to equal 
a 2 = 0.09 with the new fertilizer. Using X, the mean of a random sample 
of size n, as the test statistic^ reject H 0 if x > c. Find n and c so that the 
power function AT(^) = Pr(X > c : n) is such that a = A^(l,5) = 0.05 and 
A ： (1.7) = 0.95. 

m 

6.67. A random sample of 100 observations from a Poisson distribution has 
a mean equal to 6.25. Construct an approximate 95 percent confidence 
interval for the mean of the distribution. 

6 .68. Say that a random sample of size 25 is taken from a binomial 
distribution with parameters n = 5 and p. These data are then lost, but we 
recall that the relative frequency of the value 5 was Under these 
conditions, how would you estimate pi Is this suggested estimate unbiased? 

6.69. When 100 tacks were thrown on a table, 60 of them landed point up. 
Obtain a 95 percent confidence interval for the probability that a tack of 
this type will land point'up. Assume independence. 

6.70. Let X\, X 2 ,. . ., A^ 8 be a random sample of size n = 8 from a Poisson 

distribution with mean fi. Reject the simple null hypothesis //„ : n = 0.5 and 

8 

accept H^. fi > 0.5 if the observed sum Y, 又 ; 之 8 . 

i= I 

(a) Compute the significance level a of the test. 

(b) Find the power function K(n) of the test as a sum of Poisson 

probabilities. I 

(c) Using the Appendix, determine K(0J5), 尺 (1), and AT(1.25). 

6.71. Let p denote the probability that, for a particular tennis player, the 
first serve is good. Since p = 0.40, this player decided to take lessons in 
order to increase p. When the lessons are completed, the hypothesis 
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H 0 \ p = 0.40 will be tested against H' : p > 0.40 based on /i = 25 trials. Let 
y equal the number of first serves that are good, and let the critical region 
be defined by C : = {y ： y>^}. 

(a) Determine a = Pr (y ^ 13; p = 0.40). 

(b) Find P = Pt(Y< 13) when p = 0.60; that is, ^ = ?r(Y< 12; 

p = 0.60). 

6.72. The mean birth weight in the United States is ^ = 3315 grams with a 
standard deviation of <r = 575. Let X equal the birth weight in grains in 
Jerusalem. Assume that the distribution of X is N(n, a 2 ). We shall test 
the null hypothesis H 0 : /x = 3315 against the alternative hypothesis 

^ < 3315 using a random sample of size n = 30. 

(a) Define a critical region that has a significance level of a = 0.05. 

(b) If the random sample of n = 30 yielded 3c = 3189 and s = 488, what is 
your conclusion? 

(c) What is the approximate /7-value of your test? 

6.73. Let Y\ < < ••- < Y s be the order statistics of a random sample of 

size .5 from the distribution having p.d.f. /(x) = exp [ —(x — 9)/P\/P ， 
9 < x < co, zero elsewhere. Discuss the construction of a 90 percent 
confidence interval for 办 if 0 is known. 


6.74. Three independent random samples, each of size 6 , are drawn from three 
normal distributions having common unknown variance. We find the three 
sample variances to be 10, 14, and 8 , respectively. 

(a) Compute an unbiased estimate of the common variance. 

(b) Determine a 90 percent confidence interval for the common variance. 


6.75. Let X ]f X 2 ,..., X„be a. random sample from Nijx, a 2 ). 

(a) If the constant b is defined by the equation Pr (X < b) = 0.90, find the 
m.l.e. of b. 

(b) If c is given constant, find the m.l.e. of Pr (X < c). 


6.76. Let Xi, X 2 , and and 5], and 爽 denote the means and the variances 
of three independent random samples, each of size 10 , from a normal 
distribution with mean ^ and variance a 1 . Find the constant c so that 


Pr 


f X t + X 2 - 2X, 
、 ^10$ + 10S|+ 10 另 



= 0.95. 


6.77. Let Y be b(\92,p). We reject H Q :p = 0.75 and accept //,:/?> 0.75 
if and only if K > 152. Use the normal approximation to determine: 

(a) a = Pr(r> 152; p = 0.75). 

(b) ^ = Pt (Y < 152) when p = 0.80. 

6.78. Let Fbe / >( 100 ,/ 7 ). To test H 0 : p = 0.08 against H t :p < 0.08, we reject 
H 0 and accept //, if and only if F < 6 . 
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(a) Determine the significance level a of the test. 

(b) Find the probability of the type II error if in fact p — 0.04, 

6.79. Let X 2 ,. .., X„be a random sample from a Bernoulli distribution 
with parameter p. If p is restricted so that we know that { < /» < 1, find the 
m.l.e. of this parameter. 

6.80. Consider two Bernoulli distributions with unknown parameters p x and 
p 2 , respectively. If Y and Z equal the numbers of successes in two 
independent random samples, each of sample size n, from the respective 
distributions, determine the maximum likelihood estimators of p\ and p 2 if 
we know that 0 < /), < /> 2 ^ 1- 

6.81. Let d F|), (X 2 , Y 2 ),..., (X„, Y„) be n i.i.d. pairs of random vari¬ 
ables, each with the bivariate normal distribution having five par¬ 
ameters /i,, /i 2 , a\, a\, and p. 

(a) Show that Z, = X/ — y, is N(n, a 2 ), where pt = fi' — n 2 and a 2 = a] — 
2pa { <r 2 + 02 ， i'= 1,2,... ,n. 

(b) Since all five parameters are unknown, n and a 1 are unknown. To test 
H 0 : n = 0 (H 0 : /i, = /i 2 ) against : /z > 0 (H t : /z, > /z 2 ), construct a 
/-test based upon the mean and the variance of the n differences 
Z,, Z 2 ,..., Z n . This is often called a paired t-test. 



CHAPTER 


Sufficient Statistics 


7.1 Measures of Quality of Estimators 

In Chapter 6 we presented some procedures for finding point 
estimates, interval estimates, and tests of statistical hypotheses. In this 
and the next two chapters, we provide reasons why certain statistics are 
used in these various statistical inferences. We begin by considering 
desirable properties of a point estimate. 

Now it would seem that if 少 = u(x\, x 2 ,..., is to qualify as a 
good point estimate of 6, there should be a great probability that the 
statistic Y = u(X } , X 2 ,..., X„) will be close to 0; that is, 6 should be 
a sort of rallying point for the numbers y = M(x t , x 2 ,..., x"). This can 
be achieved in one way by selecting Y = u(X x ,X 2 ,..., X n ) in such a 
way that not only is Y an unbiased estimator of 6, but also the variance 
of Y is as small as it can be made. We do this because the variance of 
y is a measure of the intensity of the concentration of the probability 
for Y in the neighborhood of the point 6 = E(Y). Accordingly, we 
define an unbiased minimum variance estimator of the parameter 6 in 
the following manner. 

Definition 1. For a given positive integer n,Y = u{X x y X 2 ,..., X„) 
will be called an unbiased minimum variance estimator of the par- 
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ameter 0 if F is unbiased, that is, E(Y) = 0, and if the variance of Y 
is less than or equal to the variance of every other unbiased estimator 
of 6. ^ 

For illustration, let Xi, X 2 ,..., X 9 denote a random sample from 
a_ distribution that is N(0, 1), — oo <_9 < co. Since the statistic 
X = (X ] + X 2 + •' • + X 9 )/9 is N(0, i), X is an unbiased estimator of 
0. The statistic is N(0, 1), so X y is also an unbiased estimator of 
6. Although the variance ^ of is less than the variance 1 of we 
cannot say, with n = 9, that X is the unbiased minimum variance 
estimator of 0\ that definition requires that the comparison be made 
with every unbiased estimator of 9. To be sure, it is quite impossible 
to tabulate all other unbiased estimators of this parameter 6, so other 
methods must be developed for making the comparisons of the 
variances. A beginning on this problem will be made in this chapter. 

Let us now discuss the problem of point estimation of a parameter 
from a slightly different standpoint. Let X ly X 2 ,..., denote a 
random sample of size m from a distribution that has the p.d.f./Cx; 0 )， 
6eQ. The distribution may be either of the continuous or the discrete 
type. Let Y = u(X { , X 2 ,..., A r „)bea statistic on which we wish to base 
a point estimate of the parameter 9. Let 办(力 be that function of the 
observed value of the statistic Y which is the point estimate of 6. Thus 
the function d decides the value of our point estimate of 9 and d is called 
a decision function or a decision rule. One value of the decision function, 
say $00, is called a decision. Thus a numerically determined point 
estimate of a parameter 0 is a decision. Now a decision may be correct 
or it may be wrong. It would be useful to have a measure of the 
seriousness of the difference, if any, between the true value of 6 and the 
point estimate 6 ( 少 ). Accordingly, with each pair, [0, <5( 少 )] ， 0eQ, we 
will associate a nonnegative number ^[6, <5(^)] that reflects this 
seriousness. We call the function if the loss function. The expected 
(mean) value of the loss function is called the risk function. If g{y; 6), 
0 e Q，is the p.d.f. of Y, the risk function R(6, is given by 

R(e, s) = e{ ne, <5( y>]} = 6) d y 

^ — oo 

if y is a random variable of the continuous type. It would be desirable 
to select a decision function that minimizes the risk R(6, S) for all values 
of 0,6 e Q. But this is usually impossible because the decision function 
S that minimizes R(6, d) for one value of 9 may not minimize 
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R(6, 5) for another value of 8. Accordingly, we need either to restrict 
our decision function to a certain class or to consider methods of 
ordering the risk functions. The following example, while very simple, 
dramatizes these difficulties. 


Example 1. Let H, … , X 2i be a random sample from a distribution 
that is N(Q, 1), — oo < 0 < oo. Let Y = X, the mean of the random sample, 
and let S^[6, 5(_v)] = [0 — 5(_y)] 2 . We shall compare the two decision functions 
given by 5|( 少） = 少 and 5 2 ( 少） = 0 for —oo < j < oo. The corresponding risk 
functions are 

R(e,8 l ) = E[(e-Yf\ = i- i 

and 


R(9, 5 2 ) = E[(d- 0 ) 2 ] = 6 2 . 

Obviously, if, in fact, 9 = 0, then 5 2 (_v) = 0 is an excellent decision and we have 
及 (0, S 2 ) = 0 . However, if 9 differs from zero by very much, it is equally 
clear that 5 2 (j) = 0 is a poor decision. For example, i£, in fact, 6 = 2, 
R(2, 5 2 ) = 4 > R(2, = In general, we see that R(6, S 2 ) < R(6, 5,), 
provided that and that otherwise R{6, 5 2 ) > R(6, 5,). That is, one 

of these decision functions is better than the other for some values of 6 and 
the other decision functions are better for other values of d. If, however, we 
had restricted our consideration to decision functions S such that £ 1 ^( Y)] = 8 
for all values of 6,8 e fl, then the decision 5 2 (>0 = 0 is not allowed. Under this 
restriction and with the given 5(^)], the risk function is the variance of 
the unbiased estimator 5( y), and we are confronted with the problem of 
finding the unbiased minimum variance estimator. Later in this chapter we 
show that the solution is S(y) = y = x. 

Suppose, however, that we do not want to restrict ourselves to decision 
functions S, such that £ 1 ^( y)] = 6 for all values of 6, 6e£l. Instead, let us 
say that the decision function that minimizes the maximum of the risk 
function is the best decision function. Because, in this example, R(0, d 2 ) = 6 2 
is unbounded, 5 2 ( 少 ） = 0 is not, in accordance with this criterion, a good 
decision function. On the other hand, with — oo < 0 < oo, we have 


max R(9, 3,) = max ( 55 ) — 

0 e 

Accordingly, ^ 1 (^) ="y = x seems to b 6 a very good decision in accordance 
with this criterion because^ is small. Asa matter of fact, it can be proved that 
is the best decision function, as measured by the minimax criterion, when 
the loss function is £f[6, 5(^)J = [6 ^ Xx)] 2 . 

In this example we illustrated the following: 


1 . Without some restriction on the decision function, it is difficult to 



310 


Sufficient Statistics |Ch. 7 


find a decision function that has a risk function which is uniformly 
less than the risk function of another decision function. 、 

2. A principle of selecting a best decision function, called the minimax 
principle. This principle may be stated as follows: If the decision 
function given by 5 0 ( 少 ） is such that, for all 0 e Q, 


max 及 [0, 〜 ( 少 )] S max 及 [0,5( 少 )] 

6 e 

for every other decision function 厶 ( 少 )， then 占 0 ( 少 ） is called a minimax 

decision function. 

With the restriction E[S( Y)] — 0 and the loss function 
£^[Q, 5( 少 )] =[9 — <5( 少 )] 2 , the decision function that minimizes the risk 
function yields an unbiased estimator with minimum variance. If, 
however, the restriction £[<5( y)] = 0 is replaced by some other 
condition, the decision function S(Y), if it exists, which minimizes 
E{[0 — 6{ y)] 2 } uniformly in 6 is sometimes called the minimum 
mean-square-error estimator. Exercises 7.6, 7.7, and 7.8 provide 
examples of this type of estimator. 

There are two additional observations about decision rules and loss 


functions that should be made at this point. First, since risa statistic, 
the decision rule d(Y) is also a statistic, and we could have started 
directly with a decision rule based on the observations in a random 
sample, say Ad ， X 2 ,. -., X n ). The risk function is then given by 

m S t ) = E{^[9, S t (X u X„)]} 




， o0 

^00 


[ 

— . % 


邓， .. •, x fl )] 


X f(x t ; 0) - - 0) dx' … dx„ 

if the random sample arises from a continuous-type distribution. We 
did not do this because, as you will see in this chapter, it is rather easy 
to find a good statistic, say Y, upon which to base all of the statistical 
inferences associated with a particular model. Thus we thought it more 
appropriate to start with a statistic that would be familiar, like the 
ra.l.e. Y = X in Example 1. The second decision rule of that example 
could be written d 2 {X x , X 2 ,..., X n ) = 0, a constant no matter what 
values of X x , X 2 ,... ,X n are observed. 

The second observation is that we have only used one loss 
function, namely the square-error loss function d) = (9 — d) 2 . 
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The absolute-error loss function Sf(9, (5) = |0 — (5| is another popular 
one. The loss function defined by 

seie, s) = o ， \e - d\ < a, 

= b y \0 — «5| > a, 

where a. and b are positive constants, is sometimes referred to as the 
goal post loss function. The reason for this terminology is that football 
fans recognize it is like kicking a field goal: There is no loss (actually 
a three-point gain) if within a units of the middle but b units of loss 
(zero points awarded) if outside that restriction. In addition, loss 
functions can be asymmetric as well as symmetric as the three previous 
ones have been. That is，for example, it might be more costly to 
underestimate the value of 0 than to overestimate it. (Many of us think 
about this type of loss function when estimating the time it takes us 
to reach an airport to catch a plane.) Some of these loss functions are 
considered when studying Bayesian estimates in Chapter 8 . 

Let us close this section with an interesting illustration that raises 
a question leading to the likelihood principle which many statisticians 
believe is a quality characteristic that estimators should enjoy. Suppose 
that two statisticians, A and B, observe 10 independent trials of a 
random experiment ending in success or failure. Let the probability of 
success on each trial be 6, where 0 < 0 < \. Let us say that each 
statistician observes one success in these 10 trials. Suppose, however, 
that A had decided to take w = 10 such observations in advance and 
found only one success while B had decided to take as many 
observations as needed to get the first success, which happened on the 
10th trial. The model of A is that Yisb(n = 10, 0)and_y = 1 is observed- 
On the other hand, B is considering the random variable Z that has 
a geometric p.df. g{z) = (1 — 9y~'0, z = 1, 2, 3,..., and z = 10 is 
observed. In either case, the relative frequency of success is 



n z 10 ’ 


which could be used as an estimate of 9. 

Let us observe, however, that one of the corresponding estimators, 
Y/n and 1/Z, is biased. We have 

E {i) = To E ^=To^ = e 
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while “ 



= 0 + |(l — 0)6 + |(1 _ 0)^9 + • ■ • > 0 . 

That is, 1/2isa biased estimator while K/lOis unbiased. Thus A is using 
an unbiased estimator while B is not. Should we adjust ffs estimator 
so that it too is unbiased? 

It is interesting to note that if we maximize the two respective 
likelihood functions, namely 

( 沒) = 

and 

L 2 (d) = (\-ey~% 

with n = 10, y = 1, and z = 10, we get exactly the same answer, 沒 =;■ 
This must be the case, because in each situation we are maximizing 
(1 — Of 8. Many statisticians believe that this is the way it should be 
and accordingly adopt the likelihood principle: 

Suppose two different sets of data from possibly two different random 
experiments lead to respective likelihood ratios, L t (6) and 1^(0), that are 
proportional to each other. These two data sets provide the same 
information about the parameter 0 and a statistician should obtain the 
same estimate of 6 from either. 

In our special illustration, we note that L^ccLziG), and the 
likelihood principle states that statisticians A and B should make the 
same inference. Thus believers in the likelihood principle would not 
adjust the second estimator to make it unbiased. 

EXERCISES 

7.1. Show that the mean A" of a random sample of size n from a distribution 

having p.d.f. f{x\ 6) = 0<jc<oo, O<0<oo, zero elsewhere, 

is an unbiased estimator of 9 and has variance d^/n. 

7.2. Let X 2 ,. ■ ~, X n denote a random sample from a normal distribution 

with mean zero and variance 0,0 < 0 < oo. Show that ^Xjjnis an unbiased 
estimator of 9 and has variance 29 2 jn. 1 
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7.3. Let y, < Y 2 < Y) be the ord^- statistics of a random sample of size 3 from 
the uniform distribution having p.d.f./(x; 0 ) = 1/9,0 < x < 0,0 < 9 < oo, 
zero elsewhere. Show that 4Y t , 2Y 2t and 5 y 3 are all unbiased estimators of 
9. Find the variance of each of these unbiased estimators. 


7.4. Let y, and Y 2 be two independent unbiased estimators of 0. Say the 
variance of 7| is twice the variance of Y 2 . Find the constants k x and k 2 so 
that k x Y x + k 2 Y 2 is an unbiased estimator with smallest possible variance 
for such a linear combination. 


7.5. In Example 1 of this section, take 5(y)] = \d — 5(y)|. Show that 
R(6, 心） = jy/Vn and R(9, S 2 ) = |0|. Of these two decision functions 5, and 

which yields the smaller maximum risk? 

7.6. Let X 2 ,..., X„ denote a random sample from a Poisson distribution 
with parameter 6, 0 < 0 < oo. Let Y =Y d X i and let if[0, 5(y)]= 

i 

[9 — If we restrict our considerations to decision functions of the 

form 5(y) = b + y/n, where b does not depend upon y, show that 

R(0, d) = b 2 + 9/n. What decision function of this form yields a uniformly 

smaller risk than every other decision function of this form? With this 

solution, say 5, and 0 < 0 < 00 , determine max R(6, 5) if it exists. 

* 9 

7.7. Let X 2 ,..., X„ denote a random sample from a distribution that is 

N(jx, 0),0 < 9 < oo, where fi is unknown. Let Y — Xfln = S 1 and 

■V l 

let if[0, = [6 — 5(_y)] 2 , If we consider decision functions of the form 

= by, where b does not depend upon y, show that R(9, 5) = (6 2 / 
« 2 )[(« 2 — 1) Z > 2 — 2n(n — 1)6 + n 2 ]. Show that b = n/(n + 1) yields a 
minimum risk for decision functions of this form. Note that nY/(rt + 1) is 
not an unbiased estimator of 6. With 5(y) = ny/(n + 1) and 0 < 0 < 00 , 
determine max R(0, S) if it exists. 

B 

7.8. Let Xi, X 2 ,... ,X n denote a random sample from a distribution that is 

6(1,0),O<0< l.Let and let J5f[0, d(y)] = [6- S(y)] 2 . Consider 

1 

decision functions of the form 占(>0 = by, where b does not depend upon y. 
Prove that R(0, d) = b 2 n$(l — 6) + (bn — \) 2 6 2 . Show that 


max R(9, 5) 

e 


AV 


4[b 2 n - (bn - l) 2 ] * 


provided that the value b is such that b^n > 2(bn — l) 2 . Prove that b =\/n 
does not minimize max R($, S). 
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7.9. Let X t , X 2 ,..., X„bea random sample from a Poisson distribution with 
mean 0 > 0. 

(a) Statistician A observes the sample to be the values x t , x 2 ,.. ■, x„ with 
sum y\ = 2 JC/. Find the m.l.e. of 9. 

(b) Statistician B loses the sample values x,, x 2 ,..., but remembers the 

sum yi and the fact that the sample arose from a Poisson distribution. 
Thus R decides to create some fake observations which he calls 
z t ,z 2 , ..., (as he knows they will probably not equal the original 
x-values) as follows. He notes that the conditional probability of 
independent Poisson random variables Z', Z 2 , …， Z" being equal to 
Z|, 2 2 ,..., z„, given Z z, = is - 

0 ^ e -» 

A! z 2 ! _ zj_ ^ 妁！ AY'/lV 5 

(ndy'e~ n6 . _ z,!z 2 ! _ • 、! 

妁！ 

since K, = £ Z ； has a Poisson distribution with mean rid. The latter 
distribution is multinomial with 乂 independent trials, each terminating 
in one of n mutually exclusive and exhaustive ways, each of which has 
the same probability 1/n. Accordingly, B runs such a multinomial 
experiment 乃 independent trials and obtains z,, z 2 ,... ,z n . Find the 
likelihood function using these z-values. Is it proportional to that of 
statistician A? 

Hint: Here the likelihood function is the product of this conditional 
p.d.f. and the p.d.f. of y, = 2 Z,. 

7.2 A Sufficient Statistic for a Parameter 

Suppose that X t , X 2l ..., X n is a random sample from a dis¬ 
tribution that has p ： d.f. f(x; B\ 9eQ. In Chapter 6 and Section 7.1 
we constructed statistics to make statistical inferences as illustrated by 
point and interval estimation and tests of statistical hypotheses. We 
note that a statistic, say Y = u(X t , X 2 ,..., X„), is a form of data 
reduction. For illustration, instead of listing all of the individual 
observations X 2 ,..., X„, we might prefer to give only the sample 
mean X or the sample variance S 2 . Thus statisticians look for ways of 
reducing a set of data so that these data can be more easily understood 
without losing the meaning associated with the entire set of 
observations. 

It is interesting to note that a statistic Y = u(X；, X 2 ,..., X„) really 
partitions the sample space of X 2 ,..., X n . For illustration, 
suppose we say that the sample was observed and x = 8.32. There are 
many points in the sample space which have that same mean of 8.32, 
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and we can consider them as belonging to the set {(x,, x 2 ,... ， x ”）： 
3c = 8.32}. As a matter of fact, all points on the hyperplane 

+ X 2 + • • - + x n = (8.32)/1 

yield the mean of x = 8.32, so^this hyperplane is that set. However, 
there are many values that X can take and thus there are many 
such sets. So, in this sense, the sample mean X ~or any statistic 
Y = 1 /(^ 1 , X 2 ,..., X „) — partitions the sample space into a collection 
of sets. 

Often in the study of statistics the parameter 0 of the model is 
unknown; thus we desire to make some statistical inference about it. In 
this section we consider a statistic denoted by Y t = i/, iX { , X 2 , ..., X„), 
which we call a sufficient statistic and which we find is good for making 
those inferences. This sufficient statistic partitions the sample space in 
such a way that, given 

， A*2，‘ . • ， 6 {(X|, ^2, • • • ) • W|(X|, X 2 , . •. ，文 ” ） = 少 1 }, 

the conditional probability of A",, , X„ does not depend upon 6. 

Intuitively, this means that once the set determined by Y\ = y, is fixed, 
the distribution of another statistic, say V 2 — u 2 (X ]t X 2 ,... t X n ), does 
not depend upon the parameter 6 because the conditional distribution 
of ^ 1 , A" 2j .. •, X n does not depend upon 6. Hence it is -impossible to 
use K 2 , given K, = y { , to make a statistical inference about 6. So, in a 
sense, K, exhausts all the information about 6 that is contained in the 
sample. This is why we call Y x = u i (X ] ,X 2 ,..., X„) a sufficient 
statistic. 

To understand clearly the definition of a sufficient statistic for a 
parameter Q, we start with an illustration. 

Example 1. Let X 2 ,..., X„ denote a random sample from the 
distribution that has p.d.f. 

/(x ； e) = 0*(1 -ey~ x , x = 0, 1; 0 < 0 < 1; 

= 0 elsewhere. 

The statistic Y t = + X 2 + • ■ ■ + X n has the p.d.f. 

g.Cv.; 0 ) = 0(1 - er~ y \ y, = 0 , 1 , 

= 0 elsewhere. 

What is the conditional probability 

Prd = X 2 = X 2 ,.. ., X„ ~ x n \Y\ = ^ 1 ) = P^A\B), 
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say, where y, = 0, 1 , 2,. .., n? Unless the sum of the integers X|, x 2 ,..., 
(each of which equals zero or 1) is equal to y x , the conditional probability 
obviously equals zero because n B — 0. Butin the case = S x,-, we have 
thatv4 cr Bso that A n B = A^nAP{A\B) = P{A)/P{B); thus the conditional 
probability equals 


俨 】(1 一 ^>(1 一的 I - A 




(1 -0) a -乃 




Since 少 i = + x 2 + . ■ • + jc" equals the number of Ts in the n inde¬ 

pendent trials, this conditional probability is the probability of selecting 
a particular arrangement of l’s and (« — 少 i) zeros. Note that this 
conditional probability does not depend upon the value of the parameter 8. 

~ t 

In general, let g\{y\',&) be the p.d.f. of the statistic y,= 
u x {X x , X 2 ,..., where H ••■ ， is a random sample arising 
from a distribution of the discrete type having p.d.f./(x; 8), Bed. The 
conditipriai probability of = x,, X 2 = x 2 ,. ■., = x„, given 

= yI, equals 

Q)f(x 2 -, 0) - - -f(x n ; 0) 

provided that x u x 2 ,..., x„ are such that the fixed y x = 

..., and equals zero otherwise. We say that 
= u x (X ti X 2 ,.,., X„) is a sufficient statistic for 0 if and only if this 
ratio does not depend upon 6. While, with distributions of the 
continuous type, we cannot use the same argument, we do, in this case, 
accept the fact that if this ratio does not depend upon 6, then the 
conditional distribution of X U X 2 ,..., X„, given Y\ = y\, does not 
depend upon 0. Thus, in both cases, we use the same definition of a 
sufficient statistic for 0. 


Definition 2. Let X 2 ,..., X„ denote a random sample 
of size n from a distribution that has p.d.f. f(x\ 6), 6 e Cl. Let 
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y, = «,(^| , X 2 ,..., X„)bca statistic whose p.d.f. is 0). Then y, 
is a sufficient statistic for 6 if and only if 


八 x';0)f(x 身 
g\Mxi,x 2 ,... , x„) ； d] 


"(JC ,， 文 2 ,… ， x„), 


where H{x u x 2y ..., x„) does not depend upon 6 e Cl. 


Remark. In most cases in this book, X u X r ,..., X„ do represent the 
observations of a random sample; that is, they are i.i.d. It is not necessary, 
however, in more general situations, that these random variables be 
indq>endent; as a matter of fact, they do not need to be identically distributed. 
Thus, more generally, the definition of sufficiency of a statistic 
K, = u^Xi, X 2 ,..., X„) would be extended to read that 


f(xi, ..., 9) 

gi[M,(x,, x 2 ,..., x„); 6] 


H(x„ x 2 , … ， x ”） 


does not depend upon 9eQ, where /(x,, x 2 ,.. • ， x„; 6) is the joint p.d.f. 
of X x> , X n . There are even a few situations in which we need an 

extension like this one in this book. 


We now give two examples that are illustrative of the definition. 

Example 2. Let X 2 ,..., X„ be a random sample from a gamma 
distribution with a = 2 and p = 6 > 0. Since the m.g.f. associated with this 

distribution is M(t) = (1 — 6t)~ 2 , t < 1/0， the m.g.f. of K, = X, is 


E[e ，iX ' + 々 ++&)] = E{e ,x ')E{e ,Xl ) - - - £(〆 "） 

= [(1 - 00 -2 ]" = (1 - 

Thus 广 has a gamma distribution with a = 2n and ^ = 0, so that its p.d.f. is 

尽办“ 0 ) = r(2n)e 2 " y ^~ le y ' ie, 0 < 乃 < °°， 

= 0 elsewhere. 

Thus we have that the ratio in Definition 2 equals 


-^L-\ e ~X x i9- 



~^ n - x e~ x ^~ 


_ r(2)0 2 

_ r(2)0 2 


_ r(2)0 2 

r(2w) XyX 2 ■- -X„ 

(x x + x 2 + •' 

■ + x n f n ~ l e 

-UC 1 + X 2 + … +X„}/0 

一 [r(2)]" (x, + x 2 + ■ ■ • x„) 


r(2n)0 2B 
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where 0 < x, < oo, / = 1,2,..., n. Since this ratio does not depend upon 0 ， 
the sum y, is a sufficient statistic for B. 

Example 3. Let Y t < Y 2 < ■- < Y„ denote the order statistics of a 
random sample of size n from the distribution with p.d.f. 

/(x ; 0) = e _U)- 

Here we use the indicator function of set A defined by 

= 1 , xeA, 

— 0, x ♦ A. 

This means, of course, that f{x\ 8) = e~ ix ~ 6) , 6 < x < <x>, and zero elsewhere. 
The p.d.f. of Y t — min (JQ is 

g l (yu6) = ne-^'-% <ao) (y l ). 

Thus we have that 


since 


n 石介 4 %,^,) 

/-1 


乂 fl oo>(min A) — ~ ne -nm in x l 

n 

II /(fl,oo)(jf,) = / ( 8,«)(min jt ( ), because when 9 < min x h then 0 < x h 


/= 1,2,... ,n, and at least one x-value is less than or equal to 6 when 
min x, ^ 0. Since this ratio does not depend upon 6, the first order statistic 
Y t is a sufficient statistic for 0. 


If we are to show, by means of the definition, that a certain 
statistic Yi is or is not a sufficient statistic for a parameter 0, we must 
first of all know the p.d.f. of Y u say 0). In some instances it may 

be quite tedious to find this p.d.f. Fortunately, this problem can be 
avoided if we will but prove the following factorization theorem of 
Neyman. 

Theorem 1. Let X 2 ,..., X„ denote a random sample from 
a distribution that has p.d.f. f(x; 0), 0 bCI. The statistic Y ]= 
X 2 ,..., X„) is a sufficient statistic for 0 if and only if we can find 
two nonnegative functions, and k 2 , such that 

0)f{x 2 ; 6) - - 6) 

= ^|[«|(JC|, x 2 , … ， x„); 0\k 2 (x }> x 2 , … ， x„\ 
where k 2 (xi , x 2 ,..., x n ) does not depend upon 0. 

Proof. We shall prove the theorem when the random variables 
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are of the continuous type. Assume that the factorization is as 
stated in the theorem. In our proof we shall make the one-to-one 
transformation y, = , xj, y 2 = m 2 (x,, ■ •., x”)，• ■. ， = 

u„(x t ,... ,x„) having the inverse functions x, = vv,(^,,... ， y„), 
X 2 = W 2 ( 少 , ,..., y„), = W„{y\ ， … ， J”）and Jacobian J. The 

joint p.d.f. of the statistics Y\, Y„ is then given by 

g(yy,y 2 , .. .,y„ ； 0) = k^y\-,6)k 2 {w\, w 2 , . . . , w„)|yi, 

where w ( = w,-(ji ， 少 2 , .. • ， y n ), i = 1,2, ...,The p.d.f. of Y u say 
gi(^i ； 0), is given by 

， Q0 J*O0 

^1(^1 ； 0)= … g(yy,y 2 , •• - ,ynlO)dy 2 - -dy n 


^(yr, 0) 


\J\k 2 {W\, W 2 ,..., wj dy 2 ■■- dy„. 


Now the function k 2 does not depend upon 6. Nor is 8 involved in 
either the Jacobian J or the limits of integration. Hence the (w — 1)- 
fold integral in the right-hand member of the preceding equation is 
a function of alone, say /«(>»,). Thus 

If m{y x ) = 0, then 0) = 0. If > 0, we can write 

…， x ") ; 0]= 巧 1 ’ …， 3)1:] ， 

m[u y (x l7 … ， x n )] 

and the assumed factorization becomes 


f(Xi ； 0) - - f(x„;0) = , x„); 0 ] … .’ x ") • 

(^ 1 ， ■ _ • ， X n)\ 

Since neither the function k 2 nor the function m depends upon 0, then 
in accordance with the definition, K, is a sufficient statistic for the 
parameter 0. 

Conversely, if F, is a sufficient statistic for 0, the factorization can 
be realized by taking the function A:, to be the p.d.f. of Y ly namely the 
function g,. This completes the proof of the theorem. 


Example 4. Let X u X 2 ,. ■. ,X„ denote a random sample from a distri- 
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bution that is N(9, a 3 ), —co<6<co, where the variance ct 2 > 0 is known. 

n 

If ; = E Xf/n, then 

I 


t (x, -6f= [(x, - x) + (x - 0)] 2 = (x, -x) 2 + n(x - Qf 

i ** l / « 1 i » 1 


because 

2 Y ( x i - - 0) = 2(x - 9) ^ (x t -x) = 0. 

I /»= I 

Thus the joint p.d.f. of X u X 2y ... ^ X n may be written 

( — 7=) exp - ^ (x, - 9) 2 /2a 2 
\aj2nJ L - = i 」 

={exp [- «(x- e^lla 2 ]} 



Since the first factor of the right-hand member of this equation depends upon 
x,, x 2 ,..., only through 3c, and since the second factor does not depend 
upon 9, the factorization theorem implies that the mean X of the sample is, 
for any particular value of a 3 , a sufficient statistic for 0， the mean of the normal 
distribution. 

We could have used the definition in the preceding example because 
we know that X is N(8, <r 2 fri). Let us now consider an example in which 
the use of the definition is inappropriate. 

Example 5. Let X 2 ,..., X„ denote a random sample from a distri¬ 
bution with p.d.f. 

= 0<x< 1, 

= 0 elsewhere, 


where 0 <6. We shall use the factorization theorem to prove that the product 
u x (Xi, X 2 ,.. ., X n ) = X x X 2 - ■ ■ X n isa. sufficient statistic for 6. The joint p.d.f. 


of X t , X 2 ,.. ., X„ is 


^ 1^2 • • • x„y- 1 = [ 俨 ( 久而 • 



where 0 < < 1, / = 1, 2,...,In the factorization theorem let 

A ： i[«i(^i, x 2 ,..., x„); 0] = - - - x„) e 
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and 


^(^1,^2, …， A) 


1 


■* 1^2 • . • X„ 


Since A: 2 (x,, x 2 ,..., x„) does not depend upon 0， the product X t X 2 • • • JC. is 
a sufficient statistic for 6. 


There is a tendency for some readers to apply incorrectly the 
factorization theorem in those instances in which the domain of 
positive probability density depends upon the parameter 9. This is due 
to the fact that they do not give proper consideration to the domain 
of the function k 2 (x t ,x 2 ,..., x„). This will be illustrated in the next 
example. 

Example 6. In Example 3 with f{x; 6) = e~ ix ~ e) I (e<aD) (x), it was found 
that the first order statistic Y x is a sufficient statistic for 9. To illustrate our 
point about not considering the domain of the function, take n = 3 and note 
that 


e -(X\ - «) e - (JC2 - 0) e - (J：3 - ») _ ^-3maxJC l + 3flj|- e -.V| - Jf 2 - Jfj + 3 max JT, j 

or a similar expression. Certainly, in the latter formula, there is no 6 in the 
second factor and it might be assumed that Y 3 = max X ( is a sufficient 
statistic for 0. Of course, this is incorrect because we should have written the 
joint p.d.f. of X 2 , X 3 as 

[y 一 e %.^(xr )][e~^~ %.*>( 々 )] 

because / (9 oo) (min x t ) = I (9<aa) (Xi A similar statement cannot 

be made with max x,. Thus K, = min X ( is the sufficient statistic for 6, not 
Y 3 '= max Xj. 


EXERCISES 


7.10. Let X t , X 2 y..., X„be a random sample from the normal distribution 

n 

N(0, $), 0 < d < oo. Show that A "； is a sufficient statistic for 6. 

I 

7.11. Prove that the sum of the observations of a random sample of size n 
from a Poisson distribution having parameter 6,0 < 6 < oo, is a sufficient 
statistic for 6. 

7.12. Show that the nth order statistic of a random sample of size n from the 
uniform distribution having p.d.f. /(jc; 9) = 1/9, 0 < x < 0, 0 <6 < oo, 
zero elsewhere, is a sufficient statistic for 6. Generalize this result by 
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considering the p.d.f. f(x; 6) = Q(fi)M(x), 0 < x < 6, 0 <6 < oo, zero 
elsewhere. Here, of course, 


r e 


M(x) dx 


Q(0) 


7.13. Let X x , X 2 ,..., be a random sample of size n from a geometric 
distribution that has p.d.f. /(jc; 0) = (l — 6) x 0, x = 0, 1 , 2,..., 0 < 0 < 1 , 

a 

zero elsewhere. Show that [ 尤 is a. sufficient statistic for 0. 

I 

7.14. Show that the sum of the observations of a random sample of size n 
from a gamma distribution that has p.d.f./(x; 0) = (\/6)e~ xte , 0 < x < oo, 
0 < 9 < oo y zero elsewhere, is a sufficient statistic for 0. 

7.15. Let A",, A" 2 ,..., A"„bea random sample of size n from a beta distribution 
with parameters a = 0 > Q and p = 2. Show that the product X { X 2 - • • X„ 
is a sufficient statistic for.0. 


7.16. Show that the product of the sample observations is a sufficient statistic 
for 0 > 0 if the random sample is taken from a gamma distribution with 
parameters a = 0 and P = 6. 

- \ 

7.17. What is the sufficient statistic for 9 if the sample arises from a beta 
distribution in which a — P — 9 > 0^ 


7.3 Properties of a Sufficient Statistic 

Suppose that a random sample X t , X 2 ,..., X„ is taken from a 
distribution with p.d.f. J{x; 6) that depends upon one parameter 6 e£l. 
Say that a sufficient statistic Y x — u^{X u X 2 ,..., X n ) for 6 exists and 
has p.d.f. g ,(^|； 6). Now consider two statisticians, A and B. The first 
statistician, A, has all of the observed data x,, x 2 ,..., x„; but the 
second, B, has only the valuer, of the sufficient statistic. Clearly, A has 
as much information as does B. However, it turns out that B is as well 
off as A in making statistical inferences about 6 in the following sense. 
Since the conditional probability of X iy X 2 ,..., X„, given y, = y x , 
does not depend upon 9, statistician B can create some pseudo 
observations, say Z,, Z 2 ,..., Z„, that provide a likelihood function 
that is proportional to that based on X t , X 2 ,..., X„ with the factor 
^ 1 (^ 1 ； 6) being common to each likelihood. The other factors of the two 
likelihood functions do not depend upon 6. Hence, in either case, 
inferences, like the m.l.e. of 6, would be based upon the sufficient 
statistic Y\. 

To make this clear, we provide two illustrations. The first is based 
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upon Example 1 of Section 7.2. There the ratio of the likelihood 
function and the p.d.f. of 7, is 


m 


g\iyu&) 


& 


where ^ x t . Recall that each x,- is equal to zero or 1, and thus y { 


is the sum of y x ones and (w—j,) zeros. Say we know only the value y x 
and not x,, x 2 ,..., x„\ so we create pseudovalues z 2 ,..., z n by 
arranging at random y x ones and (n—y^ zeros so that the probability 


of each arrangement is p= 



.Thus the probability that these 


z-values equal the original x-values is p, and hence it is highly 
unlikely, namely with probability p, that those two sets of values 
would be equal. Yet the two likelihood functions are proportional, 
namely 


o(i - 吖- 灼 

n n 

because jv, = J x t = J z,<. Clearly, the m.l.e. of 0, using either ex- 

i=i / =i 

pression, is yjn. 

The next illustration refers back to Exercise 7.9. There the sample 
arose from a Poisson distribution with parameter 0>O. It turns 

n 

out that 7, = ^ X t is a sufficient statistic for 0 (see Exercise 7.11). In 

i-i 

Exercise 7.9 we found that 



m 乃！ AVYiy 2 

^iOi ； 0 ) A! x 2 ! … 八 

when L(9) is the likelihood function based upon x u x 2 , ： .., x n . Since 
this is a multinomial distribution that does not depend upon 9, we can 
generate some values of Z,, Z 2 ,, Z„, say z,, z 2 ,..., z„, that have 
this multinomial distribution. It is interesting to note that while in the 
previous examples the z-values provided an arrangement of the 
x-values, here the z-values do not need to equal those x-values. That 
is, the values z,, z 2 ,..., z„ do not necessarily provide an arrangement 
of X|, x 2 ,..., x„. It is, however, true that £ z, = 2 x t =y x . Of course, 
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from the way the z-values were obtained, the two likelihood functions 
enjoy the property of being proportional, namely 


gi(yuG) 


■Vi! 

^|! X 2 \ ...A! 






OC 犮 1(7|;沒) 


Thus, for illustration, using either of these likelihood functions, the 
m.l.e. of 6 is yi/rt because this is the value of 6 that maximizes gj (j]; 6). 

Since we have considered how the statistician knowing only the 
value of the sufficient statistic can create a sample that satisfies the 
likelihood principle; and thus, in this sense, she is as well off as 
the statistician that has knowledge of all the data. So let us now state 
a fairly obvious theorem that relates the m.l.e. of 0 to a sufficient 
statistic. 


Theorem 2. Let X u X 2 ,..., X„ denote a random sample from a 
distribution that has p.d.f. f(x; 6), OeQ. If a sufficient statistic 
Y x = X 2 ,..., X„) for 0 exists and if a maximum likelihood 

estimator 6 of 0 also exists uniquely，then § is a function of 

= *<^2, ... ， ^Ct)- 

Proof. Let ^,( 7 ,; 0) be the p.d.f. of Y { . Then by the definition of 
sufficiency, the likelihood function 

U.0', x l ,x 2 , =/(jc,; 0)f(x 2 ; 6) - - -Jlx n ; 6) 

= g}[ui(xi,x n ); 6]H(Xi ,.. ■, x n ), 

where H{x u ..., jc„) does not depend upon 0. Thus L and 幻 ， as 
functions of 6, are maximized simultaneously. Since there is one and 
only one value of 9 that maximizes L and hence 犮 i[w 1 (jc 1 ， •.. ， jc„); ff\, 
that value of 6 must be a function of U|(jc, x„). Thus the m.l.e. 

^ is a function of the sufficient statistic = U\(X y , X 2t - ■ •, D- 

Let us consider another important property possessed by a 
sufficient statistic F, = X 2 ,..., X n ) for 6. The conditional p.d.f. 

of a second statistic, say Y 2 = u 2 {X\, X 2 ,..., X„), given Y } = y t , does 
not depend upon 0. On intuitive grounds, we might surmise that the 
conditional p.d.f. of Y 2 , given some linear function aK, + b, a ^0, 
of K,, does not depend upon 0. That is, it seems as though the 
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random variable aY x + b is also a sufficient statistic for 0. This con¬ 
jecture is correct. In fact, every function Z : = u(Y^ or Z = 
u[u i (X ] ,X 2 ,..., X„)] = v{X u X 2 ,..., X„), not involving 0, with a 
single-valued inverse K, = vv(Z), is also a sufficient statistic for 8. To 
prove this, we write, in accordance with the factorization theorem, 

0) - - ' f{x n \ Q) = k^[u x {x u x 2 , -.. 0\k 2 {x^ 

However, we find thaty, = w(z) or, equivalently, x 2 ,..., x n )= 

, x 2 ,..., jc„)], which is not a function of 6. Hence 

f(X ]; 0) •. -f(x n ; 6) = ki {w[t)(x,,..., xj]; e}k 2 {x x , xi ， … ， x n ). 

Since the first factor of the right-hand member of this equation is a 
function of z = y(jc,,..., jc„) and 9, while the second factor does not 
depend upon 6, the factorization theorem implies that Z = «(1^) is also 
a sufficient statistic for 6. 

Possibly, the preceding observation is obvious if we think about the 
sufficient statistic partitioning the sample space in such a way that 
the conditional probability of H … ， X„, given Y x = y x , does not 
depend upon 6. We say this because every function Z = u{Y x ) with a 
single-valued inverse Y x = w(Z) would partition the sample space in 
exactly the same way, that is, the set of points 

{(X| ， X 2 , ... ， x"): Ui(x u x 2 ,... =^|}, 

for each y x , is exactly the same as 

{(X, ， x 2 , … ， x„): v(x,, x 2 ,..., x„) = 

because w[«(x,,x 2 ,..., x„)] = x 2 , .. .,^) = ^ 1 . 

Remark. Throughout the discussion of sufficient statistics, as a matter of 
fact throughout much of the mathematics of statistical inference, we hope 
the reader recognizes the importance of the assumption of having a certain 
model. Clearly, when we say that a statistician having the value of a certain 
statistic (here sufficient) is as well off in making statistical inferences as the 
statistician who has all of the data, we depend upon the fact that a certain 
model is true. For illustration, knowing that we have i.i.d. variables, each with 
p.d.f. f(x; 9), is extremely important; because if that f(x; 6) is incorrect or if 
the independence assumption does not hold, our resulting inferences could 
be very bad. The statistician with all the data could — and should — check to 
see if the model is reasonably good. Such procedures checking the model are 
often called model diagnostics, the discussion of which we leave to a more 
applied course in statistics. 
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We now consider a result of Rao and Blackwell from which we see 
that we need consider only functions of the sufficient statistic in finding 
the unbiased point estimates of parameters. In showing this, we 
can refer back to a result of Section 2.2: If X x and X 2 are random 
variables and certain expectations exist, then 

E[X 2 ] = E[E{X 2 \X,)] 
and 

var ( 尤 2 ) 之 var [ 五 (m,)]. 

For the adaptation in context of sufficient statistics, we let the sufficient 
statistic Yy be X x and Y 2 , an unbiased statistic of 0, be X 2 . Thus, with 
E{Yj\yO = we have 

6 = E(Y 2 ) = E[cp(Y,)] 
and 

var(F 2 ) > var MF,)]. 

That is, through this conditioning, the function (^(F,) of the sufficient 
statistic K, is an unbiased estimator of 0 having smaller variance than 
that of the unbiased estimator Y 2 . We summarize this discussion more 
formally in the following theorem, which can be attributed to Rao and 
Blackwell. 

Theorem 3. Let X u X 2 , .…， X n , n a fixed positive integer, denote a 
random sample from a distribution (continuous or discrete) that has p.d.f. 
f(x; 6), 0 e O. Let Y } = Ui(X u X 2 , ..., X„) be a sufficient statistic for 0, 
and let Y 2 = u 2 (X f , X 2 , ..., X n ), not a function of Y\ alone, be an 
unbiased estimator of 9. Then E{Yi\y]) = (p{y\) defines a statistic (p{Y^). 
This statistic (p(Y t ) is a function of the sufficient statistic for 9\ it is an 
unbiased estimator of 0\ and its variance is less than that of Y 2 . 

This theorem tells us that in our search for an unbiased minimum 
variance estimator of a parameter, we may, if a sufficient statistic for 
the parameter exists, restrict that search to functions of the sufficient 
statistic. For if we begin with an unbiased estimator Y 2 that is not a 
function of the sufficient statistic F, alone, then we can always improve 
on this by computing E{Y 2 \y x ) = 中 ( 少 ■) so that Y { ) is an unbiased 
estimator with smaller variance than that of Y 2 . 

After Theorem 3 many students believe that it is necessary to find 
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first some unbiased estimator Y 2 in their search for (p(Y } ), an unbiased 
estimator of 8 based upon the sufficient statistic Y x . This is not the case 
at all, and Theorem 3 simply convinces us that we can restrict our 
search for a best estimator to functions of Y x . It frequently happens 
that E(Yi) = a8 b, where a # 0 and b are constants, and thus 
(Yi — b)/a is a function of Y t that is an unbiased estimator of 6. That 
is, we can usually find an unbiased estimator based on Yi without first 
finding an estimator Y 2 . In the next two sections we discover that, in 
most instances, if there is one function (p(Y,) that is unbiased, (p{Y\) 
is the only unbiased estimator based on the sufficient statistic Y t . 

Remark. Since the unbiased estimator <p( ), where <p(yi) = E( Kjbi), has 

variance smaller than that of the unbiased estimator Y 2 of 0, students 
sometimes reason as follows. Let the function T(y 3 ) = E[<p{ y[)f K 3 = ^ 3 ], 
where y 3 is another statistic, which is not sufficient for 9. By the 
Rao-Blackwell theorem, we have that £[T(K 3 )] = 6 and T(y 3 ) has a smaller 
variance than does (p(Y t ). Accordingly, T(K 3 ) must be better than <p(Yi) as 
an unbiased estimator of 0. But this is not true because Y 3 is not sufficient; thus 
6 is present in the conditional distribution of F,, given Y 3 = ^ 3 , and the 
conditional mean T(j^ 3 ). So although indeed £[T(y 3 )] = 0 ， T(K 3 ) is not even 
a statistic because it involves the unknown parameter 0 and hence cannot be 
used as an estimator. 


Example 1. Let X x , X 2 , be a random sample from an exponential 
distribution with mean 0 > 0, so that the joint p.d.f. is 


(I) e~ (x, + X2 + Xj)/e , 0 < x, < oo, 

i = 1,2, 3， zero elsewhere. From the factorization theorem, we see that 
Yi = + X 2 + X 3 is a sufficient statistic for 9. Of course, 

E{Y X ) = E{X t +X 2 + X 3 ) = 39, 


and thus K,/3 = A" is a function of the sufficient statistic that is an unbiased 
estimator of 6. 

In addition, let Y 2 = X 2 + X 3 and y 3 = The one-to-one transformation 
defined by 


^i = yi ~ yi, x 2 = y 2 - ^ 3 , 


文 3 = 少 3 


has Jacobian equal to 1 and the joint p.d.f. of K,, Y 2 , Y 3 is 




ej 


-yuo 


0<y 3 <y 7 <y { < ex). 



zero elsewhere. The marginal p.d.f. of Y { and Yy is found by integrating out 
y-i to obtain 


0 < 乃 < ^ < 00, 


0<y 3 <co, 


gn(y\,y^,B) = (^1 - yy)e~ yU6 , 

zero elsewhere. The p.d.f. of Y y alone is 
容 3(73; ^ = \ e_nie , 

zero elsewhere, since K 3 = X 3 is an observation of a random sample from this 
exponential distribution. 

Accordingly, the conditional p.d.f. of Y u given y 3 = y it is 
,_ 、 容13紗1，少 3; 设） 

客 1|3 (少 l [ V 3) = : ~ 

ft (少3;的 


(yi - y 3 )e~ {y, ~ y3), \ 0<y i <y l <co, 


zero elsewhere. Thus 


^1 ^IVi I = e( Y ' n Yi \yi 


(导岣 


E[ 


\y 3 


6 ) 

6 ) 


<*oo 


>3 


谷 } Ch - y^ {yy ^ n)i& dy s + 


少 3 


1 ) 歷 +a = w 丫 ㈨. 

Of course, £[T(y 3 )] = 0 and var [T( K 3 )] < var ( Y, /3), but T(y 3 ) is not 
a statistic as it involves 6 and cannot be used as an estimator of 6. This 
illustrates the preceding remark. 


EXERCISES 

7.18. In each of the Exercises 7.10, 7.11, 7.13, and 7.14, show that the m.l.e. 
of 0 is a function of the sufficient statistic for 9. 

7.19. Let Y s < Y 2 < Yj < Y 4 < Y s b^ the order statistics of a random sample 
of size 5 from the uniform distribution having p.d.f. J{x; 6) = Ijd, 
0 < x < 6, 0 < 6 < Co, zero elsewhere. Show that 2V 3 is an unbiased 
estimator of d. Determine the joint p.d.f. of Y 3 and the sufficient statistic 
Yi for 6. Find the conditional expectation E(2 JjLvs) = <p(ys)- Compare the 
variances of 2y 3 and cp(Y 5 ). 

Hint: All of the integrals needed in this exercise can be evaluated by 
making a change of variable such asz = yjQ and using the results associated 
with the beta p.d.f.; see Section 4.4. 
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7.20. If X t , A^isa random sample of size 2 from a distribution having p.d.f. 
f(x; 6) = (l/6)e~ xie , O<x<co,O<0<co, zero elsewhere, find the joint 
p.d.f. of the sufficient statistic V, = X ，+ X 2 for d and Y 2 = X 2 . Show that 
Y 2 is an unbiased estimator of 6 with variance ff 2 . Find E( Y 2 \y\) = <p(yi) and 
the variance of <p(Y t ). 

7.21. Let the random variables X and Y have the joint p.d.f. 
f(x,y) = : ( 2 / 伊 ) e -(JC+>l) 坩 ， 0 < x < y < oo, zero elsewhere. 

(a) Show that the mean and the variance of Y are, respectively, 36/2 and 

5^/4. " 

(b) Show that E( Y\x) = x + 0. In accordance with the theory, the expected 
value of A" + 0 is that of Y, namely, 36/2, and the variance of A" + is 
less than that of Y. Show that the variance of A" + 0 is in fact 6 2 /4. 

7.22. In each of Exercises 7.19, 7.11, and 7.12, compute the expected value 
of the given sufficient statistic and, in each case, determine an unbiased 
estimator of 8 that is a function of that sufficient statistic alone. 


7.4 Completeness and Uniqueness 

Let X u X 2t ..., be a random sample from the Poisson distri¬ 
bution that has p.d.f. 

Ax ； e) = ^-, x = o<e ； 

= 0 elsewhere. 


From Exercise 7.11 of Section 7.2 we know that 
sufficient statistic for 0 and its p.d.f. is 


gl ( 少 l;0) 




少 】 = 0 ，1，2 , . 


Z ^ is a 

f am \ 


= 0 elsewhere. 

Let us consider the family {^iCFi ； 0): 0 < 0} of probability density 
functions. Suppose that the function u(Y } ) of is such that 
£[«(1^|)] = 0 for every 0 > 0. We shall show that this requires u(y t ) 
to be zero at every point 凡 = 0, 1 ， 2, .... That is, 

£[«(r,)] = o, o<e. 


implies that 


0 = m(0) = «(1) = «(2) = m(3) = 
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We have for all 0 > 0 that 

« (n9) y 'e~ n9 

0 = E[u{Y^] = Yj u (y\) -- 

>»! =0 >V 

= e~ ne m(0) + m(1) + «(2) + •■•]• 

Since e - " 6 does not equal zero, we have that 

0 = M (0) + [nu(\)]9 + 9 2 + ■■■. 

However, if such an infinite series converges to zero for all 9 > 0, then 
each of the coefficients must equal zero. That is, 

' m( 0 ) = 0 , m/(l) = 0 , = 0 ,… 

and thus 0 = m( 0 ) == m( 1 ) = «(2) = • •, as we wanted to show. Of 
course, the condition E[u(Y x )] = 0 for all 0 > 0 does not place any 
restriction on when is not a nonnegative integer. So we see that, 
in this illustration, E[u(Y y )] = Ofor all 0 > 0 requires that «(>»,) equals 
zero except on a set of points that has probability zero for each p.d.f. 
裒 i(}i; 沒 )， 0 < 0. From the following definition we observe that the 
family 0): 0 < 0 } is complete. 

Definition 3. Let the random variable Z of either the continuous 
type or the discrete type have a p.d.f. that is one member of the family 
{h(z; 6 ) : 0eQ}. If the condition E[u(Z)] = 0, for every requires 
that u(z) be zero except on a set of points that has probability zero for 
each p.d.f. h{z\ (9), 0 e £1, then the family {h{z\ 0): 0 g Q} is called a 
complete family of probability density functions. 

Remark. In Section 1.9 it was noted that the existence of E\u{X)] implies 
that the integral (or sum) converges absolutely. This absolute convergence 
was tacitly assumed in our definition of completeness and it is needed to 
prove that certain families of probability density functions are complete. 

In order to show that certain families of probability density 
functions of the continuous type are complete, we must appeal to the 
same type of theorem in analysis that we used when we claimed that 
the moment-generating function uniquely determines a distribution. 
This is illustrated in the next example. 
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Example 1. Let Z have a p.d.f. that is a member of the family 
{k(z; 0):0 < 9 < oo}，where 

h(z; ^ e~ z/e , 0 < z < co, 

0 

= 0 elsewhere. 

Let us say that E[u{Z)\ = 0 for every 9 > 0. That is, 

1 广 ® 

- u{z)e~ z,e dt = 0 ， for 0 > 0. 

61 Jo 

Readers acquainted with the theory of transforms will recognize the integral 
in the left-hand member as being essentially the Laplace transform of u(z). In 
that theory we learn that the only function u(z) transforming to a function of 
9 which is identically equal to zero is u( 2 ) = 0, except (in our terminology) on 
a set of points that has probability zero for each h(z; 6), 0 <6. That is, the 
family {h(z; 0): 0 < 0 < oo} is complete. 

Let the parameter 6 in the p.d.f. f(x; 0), 0eQ, have a sufficient 
statistic V, = u t (X,, X 2 ,..., X„), where X 2 ,.. •, X n is a. random 
sample from this distribution. Let the p.d.f. of Y { be gi(yi',0), OeCl. 
It has been seen that, if there is any unbiased estimator Y 2 (not a 
function of Y x alone) of 0, then there is at least one function of F, that 
is an unbiased estimator of 0, and our search for a best estimator of 
0 may be restricted to functions of F,. Suppose it has been verified that 
a certain function q>{ K|), not a function of d t is such that E[(p(Y t )] = 0 
for all values o{0,0e n. Let ^(y,)be another function of the sufficient 
statistic Y x alone, so that we also have 五 [ 々 (5^)] = 0 for all values of 
9, 0 efl. Hence 

E[cp(Y { ) - if/iY,)] = 0, 0eQ ： 

If the family {gi( 7 i; 0): 0 e Q} is complete, the function q>{y \) — 
⑽ i)= = 0, except on a set of points that has probability zero. That is, 
for every other unbiased estimator of 9, we have 

^ i ) = Hy\) 

except possibly at certain special points. Thus, in this sense [namely 
妒 ( 少 1 ) = 少 CVi)，except on a set of points with probability zero], (p(Y x ) 
is the unique function of Y u which is an unbiased estimator of 0. In 
accordance with the Rao-Blackwell theorem, (p(Yi) has a smaller 
variance than every other unbiased estimator of 0. That is, the 
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statistic <jp(F,) is the unbiased minimum variance estimator of 6. This 
fact is stated in the following theorem of Lehmann and Scheffe. 

Theorem 4. Let X u X 2 ,..., X„ y n a fixed positive integer, denote a 
random sample from a distribution that has p.d.f. f{x\ 0), let 

= «|(^| , X 2 ,. .., X„) be a sufficient statistic for 0, and let the family 
{^,(_Vi; 0): 0 e Q} of probability density functions be complete. If there 
is a function of Y' that is an unbiased estimator of 6 y then this function 
of Y y is the unique unbiased minimum variance estimator of 6. Here 
“unique” is used in the sense described in the preceding paragraph. 

The statement that F, is a sufficient statistic for a parameter 6 S 
0eQ, and that the family {gi(^i ； 0): 0eO} of probability density 
functions is complete is lengthy and somewhat awkward. We shall 
adopt the less descriptive, but more convenient, terminology that F, 
is a complete sufficient statistic for 0. In the next section we study a fairly 
large class of probability density functions for which a complete 
sufficient statistic for 0 can be determined by inspection. 

EXERCISES 

7.23. If az 2 + bz + c = 0 for more thah two values of z, then a = b = c = 0. 
Use this result to show that the family {^(2, 0): 0 < 0 < 1} is complete. 

7.24. Show that each of the following families is not complete by finding at 
least one nonzero function u(x) such that E[u{X)\ = 0, for all 6>0. 

(a) f{x; 0) = —, —9<x<8, where 0 < 0 < oo, 

2d 

= 0 elsewhere. 

(b) N(0, 6), where 0 < 9 < ao. 

7.25. Let X t , X 2j ..., X„ represent a random sample from the discrete 
distribution having the probability density function 

f(x-,6) = e x (l - 0)\~\ x = 0,\, 0<B<1, 

— 0 elsewhere. 

It 

Show that yj = $ 不 is a complete sufficient statistic for 8. Find the unique 

I 

function of y, that is the unbiased minimum variance estimator of 6. 

Hint: Display £[«(^ ( )] = 0, show that the constant term «(0) is equal 
to zero, divide both members of the equation by 0 / 0, and repeat the 
argument. 
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7.26. Consider the family of probability density functions {A(z; 8):6 e fl}, 
where A(r; 9) = \/9, 0 < z < d, zero elsewhere. 

(a) Show that the family is complete provided that £1 = {0 : 0 < 0 < oo}. 
Hint: For convenience, assume that u(z) is continuous and note that the 
derivative of £[«(Z)] with respect to 9 is equal to zero also. 

(b) Show that this family is not complete if £1 = {0 : 1 < 0 < oo}. 

Hint: Concentrate on the interval 0 < z < 1 and find a nonzero function 
u(z) on that interval such that E[u(Z)] = 0 for all 0 > 1. 

7.27. Show that the first order statistic Y x of a random sample of size n from 
the distribution having p.d.f./(x; 8) = e~ (x ~ 9) ,6 < x < co, —co < 0 < oo, 
zero elsewhere, is a complete sufficient statistic for 9. Find the unique 
function of this statistic which is the unbiased minimum variance estimator 
of 6. 

7.28. Let a random sample of size n be taken from a distribution of the discrete 
type with p.d.f./(x; 9) = 1/0 ， x = \,2,... ,6, zero elsewhere，where 6 is an 
unknown positive integer. 

(a) Show that the largest observation, say Y, of the sample is a complete 
sufficient statistic for 9. 

(b) Prove that 

[Y n+l -(Y- i) n+ ']/[r-(Y- \y] 
is the unique unbiased minimum variance estimator of 9. 

7.5 The Exponential Class of Probability Density Functions 

Consider a family {/(x; 6): 6 eQ] of probability density functions, 
where Q is the interval set Q = {0 : y < 0 < S}, where y and S are known 
constants, and where 

f(x; 6) = exp 1>(0) 尺 ( 文 ) + S(x) + q(6)], a < x < b, 

= 0 elsewhere. (1) 

A p.d.f. of the form (1) is said to be a member of the exponential 
class of probability density functions of the continuous type. If, in 
addition, t 

1. neither a nor b depends upon B,y<e<d, 

2. p{6) is a nontrivial continuous function of 8, y < 6 < 6, 

3. each of # 0 and S(x) is a continuous function o(x,a < x < b. 
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we say that we have a regular case of the exponential class. A p.d.f. 
f{x\ 0) = exp [p(6)K(x) + S(x) + q{G)], x = ai,a 2 ,a 2 , ..., 

= 0 elsewhere, 

is said to represent a regular case of the exponential class of probability 
density functions of the discrete type if 

1. The set {x : x = a,, a 2 » • ■ •} does not depend upon 0. 

2. p(9) is a nontrivial continuous function of 6, y < 6 < 5. 

3. A^(jc) is a nontrivial function of x on the set {jc : jc = fl|, a 2 ,..}. 

For example, each member of the family {f(x; 9):0 <9 < oo}, 
where J{x; 0) is N(0, 6), represents a regular case of the exponential 
class of the continuous type because 


fix', 9) 


y/2n0 


e 


-x 2 !20 



Let X t , X 2 ,..., X n denote a random sample from a distribution 
that has a p.d.f. that represents a regular case of the exponential class 
of the continuous type. The joint p.d.f. of H ... ， I” is 


exp 


K ( x i) + S 5(4) + n ^) 


for a < x, < b, i = \,2,... ,n, y < d < S, and is zero elsewhere. At 
points of positive probability density, this joint p.d.f. may be written 
as the product of the two nonnegative functions 


exp 


exp 


z 私 ) 


K( Xi ) + nq{6) 

- 丨 

In accordance with the factorization theorem (Theorem 1， Section 12) 

if 

Y } = Y, 尺 (A",.) is a sufficient statistic for the parameter d. To prove that 


K, = is a sufficient statistic for 0 in the discrete case, we take 

I 

the joint p.d.f. of H …， to be positive on a discrete set of 
points, say, when XiE {x : x = a^, a 2 ,.. \,2,... ,n. We then use 

the factorization theorem. It is left as an exercise to show that in 
either the continuous or the discrete case the p.d.f. of Y y is of the form 

g ， ( 少 i; 0) = R(y t )exp [/K%i + nq{9)] 
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at points of positive probability density. The points of positive 
probability density and the function /?(>?,) do not depend upon 9. 

At this time we use a theorem in analysis to assert that the family 
{^! (.y,; 0): y < 0 < <5} of probability density functions is complete. 
This is the theorem we used when we asserted that a moment¬ 
generating function (when it exists) uniquely determines a distribution. 
In the present context it can be stated as follows. 

Theorem 5. Let f{x\ 6), y < 6 < S, be a p.d.f. which represents a 
regular case of the exponential class. Then if X u X 2 ,... ,X„ (where n 
is a fixed positive integer) is a random sample from a distribution with 

ft 

p.d.f. f(x; 0), the statistic F, = K(Xj) is a sufficient statistic for $ and 

I 

the family {g,0?,; 0) : y < 0 < ^} of probability density functions of 
is complete. That is, Y t is a complete sufficient statistic for B. 

This theorem has useful implications. In a regular case of form 

n 

(1)，we can see by inspection that the sufficient statistic is K, = K(Xi). 

I 

If we can see how to form a function of Y u say (p(Y^), so that 
= 0, then the statistic <jo(K,) is unique and is the unbiased 
minimum variance estimator of 0. 

Example 1. Let X t , X 2 ,.. ■ X„ denote a random sample from a normal 
distribution that has p.d.f. 

— oo < x < oo, — oo < 0 < cjo, 


f(x; 6) = — -= exp 


(x - ey 

2a 2 


or 

Ax\ 0) = exp^x-^-ln 

Here a 2 is any fixed positive number. This is a regular case of the exponential 
class with 

pi0) = ~- 2 , K(x) = x, 

5(x) = In Jim 1 , q{6 )= 一备 . 

Accordingly, Y ] = X t + X 2 + ■ ■ ■ + X„ = nX \s a complete sufficient statistic 
for the mean 0 of a normal distribution for every fixed value of the variance 
a 2 . Since E(Y } ) = nd, then (p(K,) = Y“n = A" is the only function of K, that 
is an unbiased estimator of 0; and being a function of the sufficient statistic 
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Y u it has a minimum variance. That is, X is the unique unbiased minimum 
finance estimator of 8. Incidentally, since Y x is a one-to-one function of 
X, X itself is also a complete sufficient statistic for 9. 

Example Z Consider a Poisson distribution with parameter d, 0 <d < oo. 
The p.d.f. of this distribution is 

f(x ； 0) = = exp [(In 6)x - In (x!) - 6], x = 0, 1,2,.. 

= 0 elsewhere. 

n 

In accordance with Theorem 5, y, = X- } is a complete sufficient statistic for 

I 

9. Since £(7,) = n9, the statistic (p(Y t )= =y_/« = : X, which is also a complete 
sufficient statistic for 6, is the unique unbiased minimum variance estimator 
of 6. ' 

EXERCISES 

7.29. Write the p.d.f. 

/(x; 6) = 0 < x < oo, 0 < 0 < oo, 

69* 

zero elsewhere, in the exponential form. If X 2 ,..., is a random 
sample from this distribution, find a complete sufficient statistic y, for d 
and the unique function ^»(y,)of this statistic that is the unbiased minimum 
variance estimator of 6. Is ip( y,) itself a complete sufficient statistic? 

7.30. Let X { , X 2 , ■ ■ ., X„ denote a random sample of size n > 1 from a 
distribution with p.d.f. f(x\ &) = de~ Bx , 0 < x < oo, zero elsewhere, and 

6 > 0. Then y = [ ^ is a sufficient statistic for 6. Prove that (n — 1)/7 is 

I 

the unbiased minimum variance estimator of 6. 

7.31. Let X 2 ,..., X„ denote a random sample of size n from a distribution 
with p.d.f. J{x; 0) = dx 6-1 , 0 < x < 1, zero elsewhere, and 6 > 0. 

(a) Show that the geometric mean (X t X 2 - - - X„y in of the sample is a 
complete sufficient statistic for 6. 

(b) Find the maximum likelihood estimator of 9, and observe that it is a 
function of this geometric mean. 

731. Let X denote the mean of the random sample X x , ... ,X n from a 
gamma-type distribution with parameters a > 0 and p = 6 >0. Compute 
E[X t \x\. 
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Hint: Can you find directly a function \j/(X) of X such that 
E[il/(X)] = ^?Is £(Jr,|]F) = 屮 (3F)? Why? 

7.33. Let X be a random variable with a p.d.f. of a regular case of the 
exponential class. Show that E[K{X)\ = — q / (8)/p / (d), provided these 
derivatives exist, by differentiating both members of the equality 

exp [p(6)K{x) + S(x) + q(0)] dx = 1 

with respect to 8. By a second differentiation, find the variance of K{X). 

7.34. Given that f\x\ B) = exp [6K(x) + 5(x) + ^(0)}, a<x<b, y<9<5 t 
represents a regular case of the exponential class, show that the moment¬ 
generating function M(t) of y = K(X) is M(t) = exp [q(B) — g(6 + 0], 
y < 6 + t < 6. 

7.35. Given, in the preceding exercise, that E(Y) = E[K(X)] = 6. Prove that 

Y is N(6, 1). — 

Hint: Consider M^O) = 6 and solve the resulting differential equation. 

7.36. If X 2 ,, X n is a random sample from a distribution that has a 
p.d.f. which is a regular case of the exponential class, show that the p.d.f. 

of I", = S K(X,) is of the form g t (yud) = /?(>»,) exp [/>(%, + nq(6)]. 

I 

Hint: Let Y 2 — X 2 ,..., Y„ = X„ be n — l auxiliary random variables. 
Find the joint p.d.f. of Y ]t Y 2 ,..., Y„ and then the marginal p.d.f. of Y t . 

7.37. Let Y denote the median and let X denote the mean of a random sample 
of the size n = 2k-\- \ from a distribution that is N(ji, a 2 ). Compute 
E{Y\X = x). 

Hint: See Exercise 7.32. 

7.38. Let X u X 2y .. ., X„bQdi random sample from a distribution with p.d.f. 
f(x; 6) = &xe -8x ，0 < x < co, where 6 >0. 

if 

(a) Argue that F = is a complete sufficient statistic for 6. 

I 

(b) Compute E(lfY) and find the function of Y which is the unique 
unbiased minimum variance estimator of 9. 

7.39. Let X lf X 2 ,, X„, « > 2, be a random sample from the binomial 
distribution b{\ t 0). 

(a) Show that Y { = X x + X 2 + • ■ ■ + X„ a complete sufficient statistic 
for 6. 

(b) Find the function <p(y,) which is the unbiased minimum variance 
estimator of 6. 

(c) Let Y 2 = (A", + X 2 )I2 and compute E{ Y 2 ). 

(d) Determine E{Y 2 \Y X = >»,). 
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7.6 Functions of a Parameter 


Up to this point we have sought an unbiased and minimum 
variance estimator of a parameter 6. Not always, however, are we 
interested in 6 but rather in a function of 6. This will be illustrated in 
the following examples. 


Example 1. Let X { , X 2 ,... ,X„ denote the observations of a random 
sample of size n> \ from a distribution that is 6(1, 0), 0 < 0 < 1. We know 

n 

that if y = I Xi, then Y/n is the unique unbiased minimum variance estimator 

I 

of 9. Now the variance of Y/n is 0(1 — Q)/n. Suppose that an unbiased and 
minimum variance estimator of this variance is sought. Because K is a 
sufficient statistic for 6, it is known that we can restrict our search to functions 
of Y. Consider the statistic (y/n)(1 — Yjri)jn. This statistic is suggested by the 
fact that Y/n is an estimator of 9. The expectation of this statistic is given by 



Y 

n 






Now E(Y) = nd and E^Y 1 ) = n0(\ -6) + n l 0\ Hence 






n \ n 


n-\ 0(1-0) 


n 


n 


If we multiply both members of this equation by n/(n — 1), we find that the 
statistic (Y/n)(] — Y/n)/(n 一 1) is the unique unbiased minimum variance 
estimator of the variance of Y/n. 

A somewhat different, but also very important problem in point 
estimation is considered in the next example. In the example the 
distribution of a random variable X is described by a p.d.f. f{x\ 0) that 
depends upon 0 e Q. The problem is to estimate the fractional part of 
the probability for this distribution which is at, or to the left of, a fixed 
point c. Thus we seek an unbiased minimum variance estimator of 
i^c; 6), where F(x; 6) is the distribution function of X. 

Example Z Let X\, X 2 ,. .., A" n be a random sample of size n > 1 from a 
distribution that is N(6, 1). Suppose that we wish to find an unbiased minimum 
variance estimator of the function of d defined by 

r r i 

Pr (X<c)= — 

J - ao 

where c is a fixed constant. There are many unbiased estimators of <J>(f — 0). 
We first exhibit one of these, say u(Xi\ a function of X x alone. We shall then 
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compute the conditional expectation^^ A^, )|^ =x] = <p(x), of this unbiased 
statistic, given the sufficient statistic X, the mean of the sample. In accordance 
with the theorems of Rao-Blackwell and Lehmann-Scheffe, (p(X) is the 
unique unbiased minimum variance estimator of <l>(r — 6). 

Consider the function w(X|), where 


w(Af, )=1, Jf, < C, 

= 0, X I ^ Cm 


The expected value of the random variable 1 /(^,) is given by 


E[u{X x )) 




dx t 



{X, - 0) 2 1 


办 I ， 


because m(x,) = 0, x, > c. But the latter integral has the value 0(f — 6). That 
is, u(X ,) is an unbiased estimator of <I>(f — 6). 

We shall next discuss the joint distribution of X\ and X and the conditional 
distribution of X { , given X = x. This conditional distribution will enable us 
to compute E[u(X { )\X = x] = (p(x). In accordance with Exercise 4.92, Section 
4.7, the joint distribution of A", and X is bivariate normal with means 9 and 
0, variances aj = 1 and g\ = l/«, and correlation coefficient p = l/y/n. Thus 
the conditional p.d.f. of X u given A" = x, is normal with linear conditional 
mean 


d + ^-(x-B) = x 

叮 2 


and with variance 


^(i - p 1 ) 


n 


n 


The conditional expectation of u{X ^), given A" = x, is then 

，00 

(P(x) = m ( x ,) 


■00 


-oo 


r^~ i cxp 

n{Xi - x) 2 ~ 


L 2( W -1)J 


n 


n 


s/2n 


exp 


”(X 〗 - x ) 2 
2 (/? — 1 ) 


dx x 


dx t . 


The change of variable z = y/n(x t — x)fy/n — 1 enables us to write, with 
c' = sfnic — ~x)lsjn — I, this conditional expectation is 

-^(c - x)- 


(p(x) 


-00 




e~ lla dz = 0(〆 ）= ❿ 





340 


Sufficient Statistics |Ch. 7 


Thus the unique, unbiased, and minimum variance estimator of <l>(c — 6) is, 
for every fixed constant c, given by q>(X) = ^[^(c - X)jy/n - 1]. 


Remark. We should like to draw the attention of the reader to a rather 
important fact. This has to do with the adoption of a principle, such as the 
principle of unbiasedness and minimum variance. A principle is not a theorem; 
and seldom does a principle yield satisfactory results in all cases. So far, this 
principle has provided quite satisfactory results. To see that this is not always 
the case, let X have a Poisson distribution with parameter d,0 < 6 < oo. We 
may look upon ^ as a random sample of size 1 from this distribution. Thus 
A 1 " is a complete sufficient statistic for 6, We seek the estimator of e~ 2B that is 
unbiased and has minimum variance. Consider Y = ( — 1)^. We have 


oo (_0Ye -6 

E(Y) = E[(-m= X — = 

x = Q 叉 . 


Accordingly, (— l) r is the unbiased minimum variance estimator of e~ 26 . Here 
this estimator leaves much to be desired. We are endeavoring to elicit some 
information about the number e -28 , where 0 < e~ 29 < 1. Yet our point 
estimate is either 一 1 or +1 ， each of which is a very poor estimate of a number 
between zero and 1. We do not wish to leave the reader with the impression 
that an unbiased minimum variance estimator is bad. That is not the case at 
all. We merely wish to point out that if one tries hard enough, he can find 
instances where such a statistic is not good. Incidentally, the maximum 
likelihood estimator of e~ w is, in the case where the sample size equals 1, e~ 2X , 
which is probably a much better estimator in practice than is the unbiased 
estimator (— l) r . 


EXERCISES 

7.40. Let X U X 2 ,. . ., X„ denote a random sample from a distribution that is 

N(6, 1), —oo < 0 < ao. Find the unbiased minimum variance estimator 
of Q\ _ 

Hint: First determine E(X 2 ). 

7.41. Let X U X 2 ,... ,X„ denote a random sample from a distribution that is 
^(0, 6). Then K = is a complete sufficient statistic for 6. Find the 
unbiased minimum variance estimator of 6 2 . 

7.42. In the notation of Example 2 of this section, is there an unbiased 
minimum variance estimator of Pr ( — c < ^ ^ c)? Here c >0. 

7.43. Let X { , X 2 ,..., X„ be a random sample from a Poisson distribution 
with parameter 0 > 0. Find the unbiased minimum variance estimator of 
Vr(X<. 1) = (1 + 6)e~ e . 
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Hint: Let m(jc,) = 1, jc, ^ 1, zero elsewhere, and find E[u(X t )\Y =凡 

n 

where Y X；. Make use of Example 2, Section 4.2. 

I 

7.44. Let X' ， Xi, • . • 、 X n denote a random sample from a Poisson distribution 
with parameter 0 > 0. From the Remark of this section, we know that 
E[(-lY'] = e- 2e . 

(a) Show that £[( — 1)^'| F, = y,] = (1 — 2/riy 1 , where y, = + 

… + A"”. 

Hint: First show that the conditional p.d.f. of X lf X 2 ,..., 
given y, = ^|, is multinomial, and hence that of X y given Y t = is 
b(yt, l/«). _ 

(b) Show that the m.l.e. of e~ 29 is e~ 2X . 

(c) Since >>i = n3c, show that (1 — 2fny i is approximately equal to e~ lx when 
n is large. 

7.45. Let a random sample of size n be taken from a distribution that has the 
p.d.f. f(x; 9) = (1/0) exp (—jc/0)/ (O oo) (jc). Find the m.l.e. and the unbiased 
minimum variance estimator of Pr (X < 2). 

7.7 The Case of Several Parameters 

In many of the interesting problems we encounter, the p.d.f. may 
not depend upon a single parameter 0， but perhaps upon two (or more) 
parameters, say 0, and 0 2i where (0,, 0 2 ) e Q, a two-dimensional 
parameter space. We now define joint sufficient statistics for the 
parameters. For the moment we shall restrict ourselves to the case of 
two parameters. 

Definition 4. Let X 2 ,..., X„ denote a random sample from a 
distribution that has p.d.f. f(x; 0 y , 0 2 ), where (0,,0 2 )eQ. Let 
Y y = Ui(X i9 X 2 ,... ,X n ) and Y 1 = u 2 (X { , X 2 ,..., X n ) be two statistics 
whose joint p.d.f. isg l2 (y, ， y 2 ; 0, ， 0 2 ) - The statistics Y t and Y 2 are called 
joint sufficient statistics for 0, .and 0 2 if and only if 

02)/(x 2 ； 0,, 9 2 ) - -f(x n - 0^0 2 ) 
g\i[u\(x u ..., x n ), u 2 (x t ,x„); 0,, Q 2 ] ' 

where H(x u x 2 ,..., x„) does ndt depend upon 0, or 0 2 . 

As maybe anticipated, the factorization theorem can be extended. 
In our notation it can be stated in the following manner. The statistics 
y, = u x {X u X 2 , ■ ■. ,X„) and Y 2 = u 2 (Xi , X 2 ,... ,X„) are joint suffi- 
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cient statistics for the parameters 0, and 0 2 if and only if we can find 
two nonnegative functions k y and k 2 such that 

o 2 y(x 2 ； e u e 2 ) - - f(x „； e y ,o 2 ) 

= k i [u i (x u x 2 , • . • ， X”) ， u 2 (x ] , . •. ， x”); 0 ,， 0 2 ]k 2 (xi, x 2 ,..., x„), 

where the function k 2 (x u x 2 ,..., x n ) does not depend upon both or 
either of 0 X and 0 2 . 


Example 1. Let A",, ^ 2 , ..., be a random sample from a distribution 
having p.d.f. 


y * (太;沒 I ， 沒2) = 


ie 2 

o 


dj 一 02<X ^2, 


elsewhere. 


where —oo<£>,<oo,O<0 2 <°o- Let Y y <Y 2 < 
tics. The joint p.d.f. of V, and Y„ is given by 

一 IJ 


< the order statis- 


(2B 2 y 


0 l -0 2 <y i <y„<d i + e 2y 


and equals zero elsewhere. Accordingly, the joint p.d.f. of X 2 ,..., X„ can 
be written, for points of positive probability density, 

-2 


/ 1 V l)[max(x,)-min pc;)]” - : 


x 


1 


(n — l)[max ⑷ 一 min (x,-)]"_ 


Since min (^,)<x y <max (x ( ), j=l,2, ..., the last factor does not depend 
upon the parameters. Either the definition or the factorization theorem 
assures us that K, and Y„ are joint sufficient statistics for d\ and 0 2 . 


The extension of the notion of joint sufficient statistics for more 
than two parameters is a natural one. Suppose that a certain p.d.f. 
depends upon m parameters. Let a random sample of size n be taken 
from the distribution that has this p.d.f. and define m statistics. These 
m statistics are called joint sufficient statistics for the m parameters if 
and only if the ratio of the joint p.d.f. of the observations of the random 
sample and the joint p.d.f. of these m statistics does not depend upon 
the m parameters, whatever the fixed values of the m statistics. Again 
the factorization theorem is readily extended. 

There is an extension of the Rao-Blackwell theorem that can be 
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adapted to joint sufficient statistics for several parameters, but that 
extension will not be included in this book. However, the concept of 
a complete family of probability density functions is generalized as 
follows: Let - 

# 

{ fi v \ i u 2, • - ■ ，沒 I ，沒2, . • . ， D :( 沒 I ，沒2, ... ，沒 m ) Eft } 

denote a family of probability density functions of k random variables 
V u V 2 ,..., V k that depends upon m parameters ( 化， 0 2 ,..., 6^) e Q. 
Let m(u,, u 2 ,... ， i?*) be a function of i?j ， y 2 , … ， u* (but not a function 
of any or all of the parameters). If 

E[u{V u F 2 ,..., V k )] = 0 


for all (0,, d 2 ,... ,O m )€Cl implies that u(v t , v 2 ,... ,v k ) = 0 at all 
points (v t ,v 2 , ■.., v k ), except on a set of points that has probability 
zero for all members of the family of probability density functions, we 
shall say that the family of probability density functions is a complete 
family. 

The remainder of our treatment of the case of several parameters 
will be restricted to probability density functions that represent what 
we shall call regular cases of the exponential class. Let X } ,X 2 ,..., X„, 
rt> m, denote a random sample from a distribution that depends on 
m parameters and has a p.d.f. of the form 


m 

f(x; e } ,0 2 ,... ,e m ) = exp £ Pj{dy ， 0 2 ,… ， O m )Kj(x) 

-j=' 

+ S(x) 4 - q(6y, 0 2 ,… ， 


( 1 ) 


for a < jc < A, and equals zero elsewhere. 

A p.d.f. of the form (1) is said to be a member of the exponential 
class of probability density functions of the continuous type. If, in 
addition, 

1. neither a nor b depends upon any or all of the parameters 

e ' ，^2， • ■. ， 8 m ， 

2. the Pj(6i,0 2 , - • • ， 6 m ), j = 1 ， 2, . • • ， m ， are nontrivial, functionally 
independent, continuous functions of 0 P yj< dj< dj,j = 
1 ， 2, … ， m ， 

3. the Kj(x),j = \,2,... ,m, are continuous fora < x < b and no one 
is a linear homogeneous function of the others. 
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4. 5(x) is a continuous function of jc, a < x < b, we say that we have 
a regular case of the exponential class. 

The joint p.d.f. of H,... ， A" n is given, at points of positive 
probability density, by 

exp X Pj( 0 \ ， … ’ 〜） [ Kj(Xi) + £ S(x t ) + nq{B x ， … ， 0J 

^ y * l / = ! i=l __ 

m n 

=exp [ p f {e u .•.,&)[ Kjix) + nq(9 u •••，〜） 

J = \ /- l __ 

n 

X exp X 取 ）. 

/ as I 

In accordance with the factorization theorem, the statistics 

Y,= i K^x,), r 2 = t K 2 {XX … ， r m = X K m m 

/■= I i. = i i. = I 


are joint sufficient statistics for the m parameters 0 t ， 0 2 ,... ， 9 m . It is 

left as an exercise to prove that the joint p.d.f. of F,. Y„ is of 

the form 

m 

及 ( 少 I ， ■ . • ，少 zn) exp ^ 5 ... ，沒 m) 乃 .+ ” 《 ( 沒 I ， . • • ，沒 m) (2) 

__i= t _ 

at points of positive probability density. These points of positive 
probability density and the function/? ( 少丨， … , y m ) do not depend upon 
any or all of the parameters 0 ,, 0 2 ,..., 9 m . Moreover, in accordance 
with a theorem in analysis, it can be asserted that, in a regular case of 
the exponential class, the family of probability density functions of 
these joint sufficient statistics Y { , Y 2 ,..., Y m is complete when n> m. 
In accordance with a convention previously adopted, we shall refer to 
y,, y 2 , • •., Y m as joint complete sufficient statistics for the parameters 
0 2 ,..., 6 m . 

Example Z Let X h , X„ denote a random sample from a 

distribution that is JV(0 h d 2 ), — c» < 0| < oo, 0 < < oo. Thus the p.d.f. 

/ {x; 0 ,, 0 2 ) of the distribution may be written as 

fix; 6 u 0 2 ) = exp x 2 + ~^x In 
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Therefore, we can take K x (x) = x 2 and AT 2 (x) = x. Consequently, the statistics 

Y t =ixj and = 

! I 

are joint complete sufficient statistics for 9 X and 0 2 - Since the relations 

7 _2i_ v 7 Y t -Yl/n 

Z '~ n ~ X, Zl ~ n- 1 - n-l 

define a one-to-one transformation, Z| and Z 2 are also joint complete 
sufficient statistics for 0, and 0 2 . Moreover, 

E{Z t ) = 61 and E{Z,{) = O 2 . 

From completeness, we have that Z x and Z 2 are the only functions of Y x and 
Y 2 that are unbiased estimators of 0, and 0 2 , respectively. 

A p.d.f. 

f(x ； e u e 2 ,...,e m ) = c\p ，沒 2 , ... ， + so) 

L/=' 

+ 砂 ' ， 0 2 ,..., 6 m ) , x = a lf a 2 , , 

zero elsewhere, is said to represent a regular case of the exponential 
class of probability density functions of the discrete type if 

1 . the set {jc : jc = a,, a 2 > • • •} does not depend upon any or all of the 
parameters …， 0„, 

2 . the Pj(6 u 0 2 ,… ， 9 m ), j = 1, 2 ,..., m, are nontrivial, functionally 
independent, and continuous functions of 6 Jy yj < Oj < 5 Jt j — 
1 ,2 ,... ,m, 

3. the j = 1,2,..., /w, are nontrivial functions of x on the set 

{x \ x = ...} and no one is a linear function of the others. 

Let X u X 2 ,..., X„ denote a random sample from a discrete-type 
distribution that represents a regular case of the exponential class. 
Then the statements made above in connection with the random 
variable of the continuous type are also valid here. 

Not always do we sample from a distribution of one random 
variable X. We could, for instance, sample from a distribution of two 
random variables V and W with joint p.d.f. f(v, w\0 u 6 2 ,..., 0 m ). 
Recall that by a random sample ( V ,, fV,),(V 2 , %)，■ ■ ■ ,(K, RKroni 
a distribution of this sort, we mean that the joint p.d.f. of these 2n 
random variables is given by 

0 i, •. ■, 0 m )f(v 2 , w 2 ; 0, ， … ， 0 J ‘ • .f(v„ ， w„; 
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In particular, suppose that the random sample is taken from a 
distribution that has the p.d.f. of V and W of the exponential class 

f(v, W ； 6 X , ... , 0 m ) 


=exp 


m 

S M 0 、， … ， e n,)Kj(v, w) + S(v, w) + q(6t . d m ) 

j= I 


⑶ 


fora <v < b, c < w < d, and equals zero elsewhere, where a, b, c, ddo 
not depend on the parameters and conditions similar to 1 to 4, p. 343, 
are imposed. Then the m statistics 

Y,= i K x (V h W,), ..., Y m =t K m (V h W) 

i J / * 1 


are joint complete sufficient statistics for the m parameters 

权 I ，沒2， • •. ，沒 m - 


EXERCISES 


7.46. Let K, < Y 2 < K 3 be the order statistics of a random sample of size 3 
from the distribution with p.d.f. 

Ax ； 0 丨， 0 2 ) = ^exp 

0i < x < c», — oo < ^| < oo, 0 < 6 2 < oo, 

zero elsewhere. Find the joint p.d.f. of Z, = Y x , Z 2 = Y 2 , and Z 3 = 
Y x + Y 2 + The corresponding transformation maps the space 
{(yi ，少 2 , Jh): h h h < °o} onto the space 

{(A, z 2 , z 3 ): 6,<z t < z 2 <( 2 3 - z,)/2 < oo} 

Show that Z| and Z 3 are joint sufficient statistics for d\ and 9 2 . 

7.47. Let X t , X 2 ,..., X„be a random sample from a distribution that has 
a p.d.f. of form (1) of this section. Show that Y { = Y, 

If 

•.. ， L = X have a joint p.d.f. of form (2) of this section. 

i= I * 

7.48. Let (A",, y,), (X 2 , Y 2 ), . •. ， (X, Y„) denote a random sample of size n 
from a bivariate normal distribution with means /i, and positive 

n h 

variances and a\, and correlation coefficient p. Show that Y, Z 

H H n II 

S Z 打 ， and ^ XfY f are joint complete sufficient statistics for the five 

i 里 i 
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parameters. Are X = Y^ Hn, F = 丫扣， 來 =[(I; — X) 2 /rt, 5f = 

« _ „ 1 ' 

£ — Y) 2 /n, and £ (X, — X)( K y - Y)/nS t S z also joint complete sufficient 

I I 

statistics for these parameters? 

7.49. Let the p.d.f. f(x; 0 Vi 9 2 ) be of the form 

exp [p x (6 u B 2 )^{x) + p 2 (0i ， 9 2 )K 2 (x) + S(x) + q(0,,6 2 )], a <x<b, 

zero elsewhere. Let K^x) = cKlix). Show that /(x; 6 x ,di) can be written 
in the form 

exp [p(6i, 6 2 )K(x) + S(x) + q,(0,,d 2 )], a < x < b, 

zero elsewhere. This is the reason why it is required that no one fCjix) be 
a linear homogeneous functioii of the others, that is, so that the number of 

sufficient statistics equals the number of pafameters. 

,' ‘： . 

7.50. Let < y 2 < . •. < be the order statistics of a random sample 

X t , X 2 ,..., X„ of size n from a distribution of the continuous type with 
p.d.f. f{x). Show that the ratio of the joint p.d.f. ofX,,X 2 ,..., X n and that 
of Ki < < • • • < y, .is equ^l to l/n\, which does not depend upon the 

underlying p.d.f. This suggests that Y { < Y 2 < < Y„ are joint sufficient 

statistics for the unknown “parameter” /• 

7.51. Let Xi, X 2 ,..., be a random sample from the uniform distri¬ 
bution with p.d.f. f(x; 6 y ,6 2 )=l /(26 2 ), — 6 2 < x < 0 { + 0 2 , where 

— oo < 0, < oo and 0 2 > 0, and the p.d.f. is equal to zero elsewhere. 

(a) Show that Y, = min (X t ) and Y H = max (X t ), the joint sufficient 
statistics for.0_ and 6 2 , are complete. 

(b) Find the unbiased minimum variance estimators of and d 2 . 

7.52. Let X t ,X 2 ,..., A" B be a random sample from N(0 U ⑹， 

(a) If the constant b is defined by the equation Pr (X <,b) = 0.90, find the 
m.l.e. and the unbiased minimum variance estimator of b. 

(b) If c is a given constant, find the m.l.e. and the unbiased Aiitiimum 
variance estimator of Pr (A" <： c). 

7.8 Minimal Sufficient and Ancillary Statistics 

In the study of statistics, it is clear that we want to reduce the data 
contained in the entire sample as much as possible without losing 
relevant information about the important characteristics of the 
underlying distribution. That is, a large collection of numbers in the 
sample is not as meaningful as a few good summary statistics of those 
data. Sufficient statistics, if they exist, are valuable because we know 
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that the statistician with those summary measures is as well off as the 
statistician with the entire sample. Sometimes, however, there are 
several sets of joint sufficient statistics, and thus we would like to find 
the simplest one of these sets. For illustration, in a sense, the 
observations X u X 2 ,..., X n , n > 2, of a random sample from 
N(fi u $ 2 ) could be thought of as joint sufficient statistics for 6 y and 0 2 . 
We know, however, that we can use A" and S 2 as joint su 伍 dent statistics 
for those parameters, which is a great simplification over using 
X y , X 2 ,, X„, particularly if n is large. 

In most instances in this chapter, we have been able to find a single 
sufficient statistic for one parameter or two joint sufficient statistics for 
two parameters. Possibly the most complicated case considered so far 
is given in Exercise 7.48, in which we find five joint sufficient statistics 
for five parameters. Exercise 7.50 suggests the possibility of using the 
order statistics of a random sample for some completely unknown 
distribution of the continuous type. 

What we would like to do is to change from one set of joint sufficient 
statistics to another, always reducing the number of statistics involved 
until we cannot go any further without losing the sufficiency of the 
resulting statistics. Those statistics that are there at the end of this 
process are called minimal sufficient statistics for the parameters. That 
is, minimal sufficient statistics are those that are sufficient for the 
parameters and are functions of every other set of sufficient statistics 
for those same parameters. Often, if there are A: parameters, we can find 
A: joint sufficient statistics that are minimal. In particular, if there is one 
parameter, we can often find a single sufficient statistic which is 
minimal. Most of the earlier examples that we have considered 
illustrate this point, but this is not always the case as shown by the 
following example. 

Example 1. Let X lf X 2 ,..., X„ be a. random sample from the uniform 
distribution over the interval (0 — 1 , 0 + 1 ) having p.d.f. 

Ax ； 9) = nW* where -oo < 0 < oo. 

The joint p.d.f. of X Xi X 2 , …， equals the product of ( 5 )" and certain 
indicator functions, namely 

n 

(5)" n Ae-1.9 + 1)(^/) = (i)"{A»-1^+i)[ m * n (■*>)]} {A«-1.® + i)[ max (- */)]}» 

f= I 

because 0 — 1 < min (x,) < x, < max (x ( ) < 0 + l,y = 1, 2,...,Thus the 
order statistics Y { = min (X t ) and Y„ = max (JQ are the sufficient statistics for 
0. These two statistics actually are minimal for. this one parameter, as 
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we cannot reduce the number of them to less than two and still have 
sufficiency. 

There is an observation that helps us observe that almost all the 
sufficient statistics that we have studied thus far are minimal. We have 
noted that the m.l.e. ^ of 0 is a function of one or more sufficient 
statistics, when the latter exist. Suppose that this m.l.e. ^ is also 
sufficient. Since this sufficient statistic ^ is a function of the other 
sufficient statistics, it must be minimal. For example, we have 

1. The m.l.e. ^ ^ of 0 in N(0, a 2 ), a 2 known, is a minimal sufficient 

statistic for 6. ^ 

2. The m.l.e. ^ = X of 6 in & Poisson distribution with mean 0 is a 
minimal sufficient statistic for 0. 

3. The m.l.e. ^ = Y n = max {X) of 9 in the uniform distribution over 
(0, 0) is a minimal sufficient statistic for 0._ 

4. The maximum likelihood estimators = X and = S 2 of 0 X and 
0 2 in N(9 U 0 2 ) are joint minimal sufficient statistics for 0^ and 0 2 . 

From these examples we see that the minimal su 伍 cient statistics do 
not need to be unique, for any one-to-one transformation of them also 
provides minimal sufficient statistics. For illustration, in 4, the L X, and 
£ Xf are also minimal sufficient statistics for 0, and 0 2 . 

Example Z Consider the model given in Example 1. There we noted that 
K, = min (Xj) and V„ = max (X,) are joint sufficient statistics. Also, we have 

e-i<r,<r„<0+i 

or, equivalently, 

- 1 < 0 < r, + 1. 

Hence, to maximize the likelihood function so that it equals ( 5 )", 6 can be any 
value between Y„ — 1 and + 1. For example, many statisticians take the 
m.l.e. to be the mean of these two end points, namely 

n - 1 + r, + 1 r, + n 
0 ^ _ ~ 2 ~ ， 


which is the midrange. We recognize 
Some might argue that since 0 is an 


,however, that this nU.e. is not unique, 
m.l.e. of 6 and since it is a function of 


the joint sufficient statistics, Y { and Y n , for 6, it will be a minimal sufficient 
statistic. This is not the case at all, for 沒 is not even sufficient. Note that the 


m.l.e. must itself be a sufficient statistic for the parameter before it can be 


considered the minimal sufficient statistic. 
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There is also a relationship between a minimal sufficient statistic 
and completeness that is explained more fully in the 1950 article by 
Lehmann and Scheffe. Let us say simply and without explanation that 
for the cases in this book, complete sufficient statistics are minimal 
sufficient statistics. The converse is not true, however, by noting that 
in Example 1 we have 


E 


n- r, 
~" 2 ^ 


n 


n 


0 , 


for all 9. 


That is, there is a nonzero function of those minimal sufficient 
statistics, Y x and Y„, whose expectation is zero for all 0. 

There are other statistics that almost seem opposites of sufficient 
statistics. That is, while sufficient statistics contain all the information 
about the parameters, these other statistics, called ancillary statistics, 
have distributions free of the parameters and seemingly contain no 
information about those parameters. As an illustration, we know that 
the variance S 2 of a random sample from N(0, 1) has a distribution that 
does not depend upon 0 and hence is an ancillary statistic. Another 
example is the ratio Z = XJ(X t + X 2 ), where X ,, X 2 is a random 
sample from a gamma distribution with known parameter a > 0 and 
unknown parameter = 0 ， because Z has a beta distribution that is 
free of 0. There are a great number of examples of ancillary statistics, 
and we provide some rules that make them rather easy to find with 
certain models. 

First consider the situation in which there is a location parameter. 
That is, let X^X 2 ,..., be a random sample from a distribution 
that has a p.d.f. of the form f(x — 9), for every real 0; that is, 0 is a 
location parameter. Let Z = u{X^ X 2l ..., X„) be a statistic such that 

u(x, + d, x 2 + d, • •. ， x„ + cf) = «( 々， Xi，• •. ， x n ), 

for all real d. The one-to-one transformation defined by W, = X t — 0, 
i = 1 ， 2, ■ • • ，《， requires that the joint p.d.f. of W x , fV 2 ,fV„ be 

f(wi)f(w 2 ) - - -f(w n ), 

which does not depend upon 0. In addition, we have, because of the 
special functional nature of w(a ： i, jc 2 , ..., x„), that 

Z = u(W t + 0 ，％ + 0， •. ■ ， + 0) = u{W u ^ 2 ,..., W n ) 
is a function of W u fV 2 ,..., fV„ alone (not of 0). Hence Z must have 
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a distribution that does not depend upon 0 because, for illustration, 
the m.g.f. of Z, namely 

/ »oo / »oo 

E(e ,z ) = ••- … ， -0).. ./(jc n - 6) dx r - dx n 

*^ — 00 *^—00 

♦ 00 产 00 

= • • . e ，U(Wl . Wn) fiy v \) * ■ "f( W n) ^ W i ' ■ ■ ^ W n 

— QO ^ — CO 

is free of 8. We call Z = u(X t ， A" 2 ,... ， A"") a location-invariant statistic. 
We immediately see that we can construct many examples of 
location-invariant statistics: the sample variance = S 2 , the sample 
range = Y n — Y u the mean deviation from the sample median = 
(l/«) £|Jr, - median ( 足 )1 ， + X 2 - X, - X 4 , X x + X % - 2X 2 , 

(\/n) Z [Xi— min «•)], and so on. 

We now consider a scale-invariant statistic. Let X u X 2 ,.. ., X n be 
a random sample from a distribution that has a p.d.f. of the form 
(\/6)f(x/0), for all 0 > 0; that is, 0 is a scale parameter. Say that 
Z — u(X lt X 2 ,..., X„) is a statistic such that 

u(cx ] , cx 2 , ... ， cx„) — u(x t ， JC 2 , ... ， X") 

for all c > 0, The one-to-one transformation defined by W- t = 
Xi/6, i = \ ,2,... ,n, requires the following: (1) that the joint p.d.f. of 
IV,, W 2 ,... t fV„ be equal to 

- - A^n), 

and (2) that the statistic Z be equal to 

z = u(ew ] ,oiv 2 ,...,ew n ) = u(w l , iv 2 ,..., w n y 

Since neither the joint p.d.f. of W x , fV 2 ,..., fV„ nor Z contain 0, the 
distribution of Z must not depend upon 0. There are also many ex- 

n 

amples of scale-invariant statistics like this: Z : A", /(X t + X 2 ), 

I 

min (D/max (Xj), and so on. 

Finally, the location and the scale parameters can be combined in 
a p.d.f. of the form (\/6 2 )f[(x — 0|)/0 2 ], —oo<0'<oo,O<0 2 <co. 
Through a one-to-one transformation defined by W ； = (X — 0\)ld 2 , 
i = 1,2,..., itis easy to show that a statistic Z = u(X } , X 2 ,..., X n ) 
such that 


u(cx t + d,..., cx„ + d) = m(jc, ， … ， x„) 
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for — oo < d < 00 , 0 < c < 00, has a distribution that does not depend 
upon 0, and 0 2 . Statistics like this Z = u(X ]t X 2 ,..., X„) are location- 
and-scale-invariant statistics. Again there are many examples: 

[max {X,) - min (^,)]/5, (X i+i - X) 2 jS\ {X, - X)/S, \X t - Xj\/S, 
i # j, and so on. , = 1 

Thus these location-invariant ， scale-invariant, and location- 
and-scale-invariant statistics provide good illustrations, with the 
appropriate model for the p.d.f., of ancillary statistics. Since an 
ancillary statistic and a complete (minimal) sufficient statistic are such 
opposites, we might believe that there is, in some sense, no relationship 
between the two. This is true and in the next section we show that they 
are independent statistics. 

EXERCISES 

7.53. Let X u X 2 ,..., X„ be a random sample from each of the following 
distributions involving the parameter 9, In each case find the m.l.e. of 6 and 
show that it is a sufficient statistic for 6 and hence a minimal sufficient 
statistic. 

(a) A(l, 6), where 0 < 0 < 1 . 

(b) Poisson with mean 0 > 0. 

(c) Gamma with a = 3 and p = 6 >0. 

(d) N(0, 1), where —oo < 0 < oo. 

(e) A^(0, 6), where 0 < 0 < oo. 

7.54. Let K, < K 2 < ' *" < 1 be the order statistics of a random sample of 
size n from the uniform distribution over the closed interval [ — 0 , d] having 
p.d.f. f{x\ d) = (1126)1^(,^(x). 

(a) Show that y, and Y n are joint sufficient statistics for 0. 

(b) Argue that the m.l.e. of 6 equals ^ = max (— Y u Y n ). 

(c) Demonstrate that the m.j.e. ^ is a sufficient statistic for Q and thus is 
a minimal sufficient statistic for 6. 

7.55. Let y, < < • • • < y„ be the order statistics of a random sample of 

size n from a distribution with p.d.f. 

fix ； e t ,d 2 ) = (£j 

where — oo < 0, < oo and 0 < 0 2 < ao. Find joint minimal sufficient 
statistics for 0 , and d 2 . 
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7.56. With random samples from each of the distributions given in Exercises 
7.53(d), 7.54, and 7.55, define at least two ancillary statistics that are 
different from the examples given in the text. These examples illustrate, 
respectively, location-invariant, scale-invariant, and location-and-scale- 
invariant statistics. 

7.9 Sufficiency, Completeness, and Independence 

We have noted that if we have a sufficient statistic K, for a 
parameter 9, 0 eQ, then A(z| 少 ■)， the conditional p.d.f. of another 
statistic Z, given F, = y u does not depend upon 0. If, moreover, K, and 
Zare independent, the p.d.f. g 2 (z) of Z is such thatg 2 (z) = h{z\y x ), and 
hence 茗 2 (z) must not depend upon 0 either. So the independence of a 
statistic Z and the sufficient statistic K, for a parameter 6 means that 
the distribution of Z does not depend upon d e Q. That is, Z is an 
ancillary statistic. 

It is interesting to investigate a converse of that property. Suppose 
that the distribution of an ancillary statistic Z does not depend upon 
6; then, are Z and the sufficient statistic Y y for 0 independent? To begin 
our search for the answer, we know that the joint p.d.f. of F, and Z 
is g\{yu &)KAy\\ where g\{y\\ 0) and h(z\y,) represent the marginal 
p.d.f. of K, and the conditional p.d.f. of Z given K, = y u respectively. 
Thus the marginal p.d.f. of Z is 

•oo 

茗 |( 少〗； &)Kz\y\) dy } = g 2 (z\ 

^ — cc 

which, by hypothesis, does not depend upon 0. Because 

• 00 

幻⑺ 幻 (少“的办丨 = 幻⑵， 、 

^ —CC 

it follows, by taking the difference of the last two integrals, that 

产 00 

[giiz) - h{z\y^{y x \ 6)dy t =0 ( 1 ) 

^ —00 

for all 6eCl. Since Y { is a sufficient statistic for 6, hiz\y x ) does not 
depend upon 6. By assumption, g 2 (z) and hence g 2 {z) — /i(z|^,) do not 
depend upon 8. Now if the family {g, (y\ ； 6): 9e Q} is complete, 
Equation (1) would require that 

g 2 (z) - h(z\y { ) = 0 or g 2 (z) = h{z\y x ). 
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That is, the joint p.d.f. of F, and Z must be equal to 

幻 ( 少 i; G)h{z\y y ) = 6)g 2 (z). 

Accordingly, Y { and Z are independent, and we have proved the 
following theorem, which was considered in special cases by Neyman 
and Hogg and proved in general by Basu. 

Theorem 6. Let , X 2 ,..., X„ denote a random sample from a 
distribution having a p.d.f. f(x; 6), 0 e Q, where Q is an interval set. Let 
Ki = M|(A"i ,X 2 ,, X n ) be a sufficient statistic for 6, and let the family 
{giCyl; 0) : 0 e Q} o /probability density functions of Y' be complete. Let 
Z = u{X Xl X 2 , . . ., X n ) be any other statistic (not a function of K, alone). 
If the distribution of Z does not depend upon 6, then Z is independent of 
the sufficient statistic K,. 

In the discussion above, it is interesting to observe that if Y\ is a 
sufficient statistic for Q, then the independence of K, and Z implies 
that the distribution of Z does not depend' upon 6 whether 
{g,(^i ； 0): 0 e fl} is or is not complete. However, in the converse, to 
prove the independence from the fact that g 2 (z) does not depend upon 
0, we definitely need the completeness. Accordingly, if we are dealing 
with situations in which we know that the family {g(^,; 0 ): 0 e Q} is 
complete (such as a regular case of the exponential class), we can say 
that the statistic Z is independent of the sufficient statistic Y ] if, and 
only if, the distribution of Z does not depend upon 9 (i.e., Z is an 
ancillary statistic). 

It should be remarked that the theorem (including the special 
formulation of it for regular cases of the exponential class) extends 
immediately to probability density functions that involve m parameters 
for which there exist m joint sufficient statistics. For example, let 
A",, A" 2 ,..., be a random sample from a distribution having the 
p.d.f./(x; 0 ,, 0 2 ) that represents a regular case of the exponential class 
such that there are two joint complete sufficient statistics for 0 ， and 0 2 . 
Then any other statistic Z = u(X u X 2y . . ., X„) is independent of the 
joint complete sufficient statistics if and only if the distribution of Z 
does not depend upon 0 , or 0 2 . 

We give an example of the theorem that provides an alternative 
proof of the independence of A" and S 2 , the mean and the variance of 
a random sample of size n from a distribution that is N(pi y ff 2 ). This 
proof is presented as if we did not know that nS 2 /a 2 is x\^ — 1) 
because that fact and the independence were established in the 
same argument (see Section 4.8). 
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Example 1. Let X U X 2 , ..., X„ denote a random sample of size n from a 
distribution that is N{p, a 2 ). We know that the mean X of the sample is, for 
every known a 2 , a complete sufficient statistic for the parameter n, 
— oo < ^ < oo. Consider the statistic 


& = 3 ( 足-奴 


which is location-invariant. Thus 5^ must have a distribution that does not 
depend upon n ； and hence, by the theorem, S 1 and X, the complete sufficient 
statistic for n, are independent. 

Example 2. Let H, …， 尤 be a random sample of size n from the 
distribution having p.d.f. 

/(jc; 6) = e~ {x ~ e \ 6 < x < oo, — oo < 0 < oo. 

= 0 elsewhere. 

Here the p.d.f. is of the form f(x — 6), where f(x) = e~ x t 0 < x < co, zero 
elsewhere. Moreover, we know (Exercise 7.27) that the first order statistic 
Y\ = min «.) is a complete sufficient statistic for 0. Hence F, must be 
independent of each location-invariant statistic u(X { , X 2 ,..., X n \ enjoying 
the property that 

, u(x t + d,x 2 + d,... ,x n -^- d) = u(x,,x 2 . x„) 

for all real d. Illustrations of such statistics are S 2 , the sample range, and 

j ； t [X f - min (X t )]. 


Example 3. Let X u X 2 denote a random sample of size n = 2 from a 
distribution with p.d.f. 

f(x\ 0) = ^ e~ Ale , 0 < jc < oo, 0 < 0 < oo, 

= 0 elsewhere. 

The p.d.f. is of the form (\/6)f(x/0), where J\x) = e~ x , 0 < x < cc, zero 
elsewhere. We know (Section 7.5) that F, = A" ，+ X 2 is a complete sufficient 
statistic Tfor 8. Hence Y x is independent of every $cale-invariant statistic 
u(X^, X 2 ) with the property u(cx u cx 2 ) = u(x ly x 2 ). Illustrations of these are 
Xi/X 2 and X y f{X x + X 2 ), statistics that have F and beta distributions, 
respectively. 

Example 4. Let X u X 2 ,..., X„ denote a random sample from a 
distribution that is N(6i, 0 2 ), — oo < 0, < oo, 0 < 0 2 < oo. In Example 2, 
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Section 7.7, it was proved that the mean X and the variance S 2 of the sample 
are joint complete sufficient statistics for 0, and d 2 . Consider the statistic 

z' U .) 2 

Z = - ~ ... ， X„), 

Z - ^) 2 

I 

which satisfies the property that m(cx ( + rf,.. ., c + d) = u(x .. x„). 

That is, -the ancillary statistic Z is independent of both X and S 2 . 

Let N(Q U 0 3 ) and N(d 2 , 0 4 ) denote two normal distributions. Recall 
that in Example 2, Section 6.5, a statistic, which was denoted by T, was 
used to test the hypothesis that 0 X = 0 2 , provided that the unknown 
variances 9 3 and 9 A were equal. The hypothesis that 0, = 0 2 is rejected 
if the computed |7] > c, where the constant c is selected so that 
«2 = Pr (|71 > c; 0, = 6 2 , 9i = 0 4 ) is the assigned significance level of 
the test. We shall show that, if 9 3 = 0 4 , Fof Exercise 6.52 and T are 
independent. Among other things，this means that if these two tests 
based on F and T, respectively, are performed sequentially, with 
significance levels a, and a 2 , the probability of accepting both these 
hypotheses, when they are true, is (1 — a,)(l — a 2 ). Thus the 
significance level of this joint test is a = 1 — (1 — a,)(l — a 2 ). 

The independence of Fand T, when 0 } = 0 4 , can be established by 
an appeal to sufficiency and completeness. The three statistics^, Y, and 

» _ m 一 

Z (Xi — X ) 2 + Z (K, — y) 2 are joint complete sufficient statistics for 

I I ' 

the three parameters 0,, 0 2 , and 63 = 0 A . Obviously, the distribution of 
Fdoes not depend upon 9 U 0 2 , or 9 3 = and hence Fis independent 
of the three joint complete sufficient statistics. However, ris a function 
of these three joint complete sufficient statistics alone, and, 
accordingly, T is independent of F. It is important to note that these 
two statistics are independent whether = d 2 or 0, ^ 0 2 . This permits 
us to calculate probabilities other than the significance level of the test. 
For example, if 0 3 = 0 4 and d, ^ 0 2 , then 

Pr (c, < F< c 2 , |71 > c) = Pr (c, < F < c 2 ) Pr(|71 > c). 

The second factor in the right-hand member is evaluated by using the 
probabilities for what is called a noncentral /-distribution. Of course, 
if 0 3 = d 4 and the difference 9 ] — 0 2 is large, we would want the 
preceding probability to be close to 1 because the event 
{c, < F < c 2 , |71 > c} leads to a correct decision, namely accept 0 3 = 0 4 
and reject = d 2 . 
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^112(^1 1^5 = 2 — t 


zero elsewhere. Note that this is uniform on the interval (0 — 1 + r 2 /2, 
0 + \ — / 2 /2); so the conditional mean and variance of T { are, respectively, 

£(r,|r 2 ) = 6 and var (r,|/ 2 ) = ( 2 12 ’ 2 ) ' 

That is, given T 2 = t 2 , we know something about the conditional variance of 
r,. In particular, if that observed value of T 2 is large (close to 2), that variance 
is small and we can place more reliant on the estimator T t . On the other 
hand, a small value of t 2 means that we have less confidence in T, as 
an estimator of 6 . It is extremely interesting to note that this conditional 
variance does not depend upon the sample size n but only on the given value 
of T 2 = t 2 . Of course, as the sample size increases, T 2 tends to become larger 
and, in those cases, T, has smaller conditional variance. 


In this section we have given several examples in which the complete 
sufficient statistics are independent of ancillary statistics. Thus, in 
those cases, the ancillary statistics provide no information about 
the parameters. However, if the sufficient statistics are not complete, 
the ancillary statistics could provide some information as the following 
example demonstrates. 

Example 5. We refer back to Examples 1 and 2 of Section 7.8. There the 
first and nth order statistics, Y t and Y„, were minimal sufficient statistics for 
0 , where the sample arose from an underlying distribution having p.d.f. 

_ i,a+i)(x). Often 7| = (F, + Y„)/2 is used as an estimator of d as it is a 
function of those sufficient statistics which is unbiased. Let us find a 
relationship between 7, and the ancillary statistic T 2 = Y„ — Y,. 

The joint p.d.f. of Y t and Y n is 

g(y\,y^,Q) = n(n- l)(_v" — y l y~ 1 / 2 \ e - \ <y t <y„<d + \, 

zero elsewhere. Accordingly, the joint p.d.f. of T| and T 2 is, since the absolute 
value of the Jacobian equals 1, 

h\ = n ( n ~ 2 /2 ”， B — 1 + ~ < t t < d + 0 < t 2 < 2 , 

zero elsewhere. Thus the p.d.f. of T 2 is 

hAh\ &) = «(« - 1)/5 ~ 2 (2 - 6)/2 ”， 0 < r 2 < 2, 

zero elsewhere, which of course is free of 6 as T 2 is an ancillary statistic. Thus 
the conditional p.d.f. of T,, given T 2 = h, is 


612 
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While Example 5 is a special one demonstrating mathematically 
that an ancillary statistic can provide some help in point estimation, 
this does actually happen in practice too. For illustration, we know that 
if the sample size is large enough; then 


_ X-ti 
S/y/n — 1 


has an approximate standard normal distribution. Of course, if the 
sample arises from a normal distribution, X and S are independent and 
rhasa r-distribution with n — 1 degrees of freedom. Even if the sample 
arises from a symmetric distribution, X and S are uncorrelated and T 
has an approximate r-distribution and certainly an approximate 
standard normal distribution with sample sizes around 30 or 40. On 
the other hand, if the sample arises from a highly skewed distribution 
(say to the right), then A" and 5 are highly correlated and the probability 
Pr(-1.96 < T < 1.96) is not necessarily close to 0.95 unless the 
sample size is extremely large (certainly much greater than 30). 
Intuitively, one can understand why this correlation exists if the 
underlying distribution is highly skewed to the right. While S has a 
distribution free of fi (and hence is an ancillary), a large value of S 
implies a large value of X, since the underlying p.d.f. is like the one 
depicted in Figure 7.1. Of course, a small value of X (say less than the 
mode) requires a relatively small value of S. This means that unless n 
is extremely large, it is risky to say that 


- 1.96j - 1.96^ 

x - 7 =, x + —== 

y/n - 1 Jn - 1 

provides an approximate 95 percent confidence interval with data from 
a very skewed distribution. As a matter of fact, the authors have seen 
situations in which this confidence coefficient is closer to 70 percent, 
rather than 95 percent, with sample sizes of 30 to 40. 



FIGURE 7.1 
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EXERCISES 

7.57. Let r, < Y 2 < Y 3 < denote the order statistics of a random sample 
of size n = 4 from a distribution having p.d.f./(x; 6 ) = \/d,0 < x < 6, zero 
elsewhere, where 0 < 6 < oo. Argue that the complete sufficient statistic Y A 
for 6 is independent of each of the statistics Yi/Y 4 and ( + Y 2 )f(Y i + Y 4 ). 

Hint: Show that the p.d.f. is of the form (1/0 )/(jc/ 0), where f(x) = 1, 
0 < jc < 1 , zero elsewhere. 

7.58. Let y, < Y 2 < ■' • < Y„ be the order statistics of a random sample from 
the normal distribution N(6, ff 2 ), —cc<8 < oo. Show that the distribution 

_ ― n 

of Z — Y„ — Y does not depend upon 0. Thus Y =Y, 丫扣 ， a complete 
sufficient statistic for 6 , is independent of Z. 1 

■ } '' 

7.59. Let A"" A" 2 ,.. . ， A"” be a random sample from the normal distribution 
N(6, o 2 ), — oo < 0 < go. Prove that a necessary and sufficient condition that 

n n 

the statistics Z = ^ a,X, and Y = ^ a complete sufficient statistic for 9, 

I » I 

be independent is that 2^ a, = 0. 

I 

7.60. Let X and y be random variables such that and £( F*) ^ 0 exist 
forfc = 1 ， 2, 3 , .. • . If the ratio Xj Y and its denominator Y are independent, 
prove that E[{XjYf] = £(JST*)/£( Y 11 ), k = 1,2, 3 ,.... 

Hint: Write E^) = Ei^X/Yf], 

7.61. Let K, < < • • • < Y„ be the order statistics of a random sample of 

size rt from a distribution that has p.d.f./(x; 6 ) = (l/d)e~ xl °^ 0 < x < oo, 

0 < 0 < oo, zero elsewhere. Show that the ratio R = nY t Y, and its 

denominator (a complete sufficient statistic for 6 ) are independent. Use the 
result of the preceding exercise to determine k = 1,2,3,.... 

7.62. Let A",, A" 2 ,..., A" 5 be a random sample of size 5 from the distribution 
that has p.d.f. fix) = e~ x , 0 < x < oo, zero elsewhere. Show that 
(A^ + X 2 )/(X i + &+•■• + ^j) and its denominator are independent. 

Hint: The p.d.f. f{x) is a member of {/(x; 6):0 <0 < oo}, where 
f(x; 9) = (\/d)e~ xie , 0 < x< oo, zero elsewhere. 

7.63. Let K, < F 2 < • • • < Y H be the order statistics of a random sample from 
the normal distribution N(6i, 6 2 ), —oo <_d] <_oo,0 < 8 2 < <x>. Show that 
the joint complete sufficient statistics X = Y and S 2 for 0， and d 2 are 
independent of each of (Y„ — Y)/S and (Y„— Y y )/S. 
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7.64. Let K, < Y 2 < • • ■ < Y„be the order statistics of a random sample from 
a distribution with the p.d.f. 

f(x; 0 2 ) = ^ exp ( 一， 

Q t < x < oo, zero elsewhere, where — oo < 0) < oo, 0 < 0 2 < °o- Show that 
the joint complete sufficient statistics Y\ and X = Y ior 0, and 0 2 are 

independent of (K 2 — F,) ( Y ； — K,). 

7.65. Let X ly X 2 ,... ,X 5 bea random sample of size n = 5 from the nonnal 
distribution N(0, 9). 

⑻ Argue that the ratio R = + + •■. + ¥) and its denomi¬ 

nator ⑻ + … + 均 are independent. 

(b) Does 5R/2 have an F-distribution with 2 and S degrees of freedom? 
Explain your answer. 

(c) Compute E(R) using Exercise 7.60. 

7.66. Let Y\ < K 2 < • • • < be the order statistics of a random sample of 
size n from a distribution having p.d.f. 

f(x; 9) = (1/0) exp ， 0 < x < oo ， 

n 

and equal zero elsewhere, where 0 < 0 < ao. Show that IV ='X t Y i and 

/" 1 
Z = nYi ^ Yj are independent. Find £(Z*) ， A: = 1,2, 3,... using the result 

of Exercise 7.60. What is the distribution of Z? 

7.67. Referring to Example 5 of this section, determine c so that 

Pr(-c< T t -d <c\T 2 = t 2 ) = 0.95. 

Use this result to find a 95 percent confidence interval for 0， given T 2 — t 2 \ 
and note how its length is smaller when the range t 2 is larger. 

ADDITIONAL EXERCISES 

7.68. Let X ]t X 2 ,..., random sample from a distribution with p.d.f. 

/ {x; 9) = 6e~ 0x , 0 < x < ao, zero elsewhere where 0 < 0. 

(a) What is the complete sufficient statistic, say Y, for 0? 

(b) What function of Y is an unbiased estimator of 0? 

7.«9. Ut F, 〈 Yi < • ’ • < be the order statistics of s random ssniplc 
of size n from a distribution with p.d.f. f(x; 9) = 1/0, 0 < x <0, zero 
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elsewhere. The statistic Y„ is a complete sufficient statistic for 6 and it has 
p.d.f. 

ny" ~ 1 

g(y K ', d) = —, o < >-„ < 0 , 

and zero elsewhere. 

(a) Find the distribution function H„(z; 6) of Z = n{6 — F fl ). 

(b) Find the lim H n (z\ 6) and thus the limiting distribution of Z. 

IT-* QO 

7.70. Let X t ,..., X„; Yi,..., Y„; Z|,..., Z„ be respective independent 
random samples from three normal distributions = a + P, o 2 ) 
N(fi 2 = ^ + y, ffj), N(jij = a + y, a 1 ). Find a point estimator for that is 
based on X, F, Z. Is this^stimator unique? Why? If o 2 is unknown, explain 
how to find a confidence interval for 

7.71. Let X 2 ,..., X„ be a random sample from a Poisson distribution 

n 

with mean d. Find the conditional expectation E(Xi + 2X 2 + 3^ 3 |^ A"；). 

I 

7.72. Let A",, , A", be a random sample of size n from the normal 

distribution N(6, 1). Find the unbiased minimum variance estimator of & 1 . 

7.73. Let X u X 2 ,..., X„ be a random sample from a Poisson distribution 
with mean 6. Find the unbiased minimum variance estimator of 伊. 

7.74. We consider a random sample X x , X z ,..., X„ from a distribution with 
p.d.f. f{x\ 6) = (1/0) exp(-xjd), 0 < x < co, zero elsewhere, where 0 <6. 
Possibly, in a life testing situation, however, we only observe the first r order 
statistics, Y t < Y 2 < ■ ■ < Y r . 

(a) Record the joint p.d.f. of these order statistics and denote it by L(d). 

(b) Under these conditions, find the m.l.e., S, by maximizing L(d). 

(c) Find the m.g.f. and p.d.f. of S. 

(d) With a slight extension of the definition of sufficiency, is sufficient 
statistic? 

(e) Find the unbiased minimum variance estimator for 6. 

(f) Show that Y x j§ and G are independent. 

7.75. Let us repeat Bernoulli trials with parameter 0 until k successes occur. 
If Y is the number of trials needed: 

(a) Show that the p.d.f. of Y is 0) = ^ _ J^(l — 0y~ k , y = k, 

^ + 1 ,..., zero elsewhere, where 0 < 0 ^ 1. 

(b) Prove that this family of probability density functions is complete. 

(c) Demonstrate that E[(k — 1)/( K — 1)] = d. 

(d) Is it possible to find another statistic, which is a function of Y alone, 
that is unbiased? Why? 
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7.76. Let X x , X 2 ,..., random sample from a distribution with p.d.f. 

/(x; 9) = 0^(1 — 0), jc = 0, 1, 2, ..., zero elsewhere, where 0 <9 <, \. 

(a) Find the m.l.e., 6 , of 6 . 

n 

(b) Show that I A",, is a complete sufficient statistic for 9. 

i= I 

(c) Determine the unbiased minimum variance estimator of 0. 

7.77. If X u . .L. ， is a random sample from a distribution with p.d.f. 
/(x; 9)=j9 i x l e~ ex , 0 < x < oo, zero elsewhere, where 0 < 0 < co: 

(a) Find the m.l.e., of 6 . Is d unbiased? 

Hint: First find the p.d.f. of y = ^ AT； and then compute £( 沒). 

/ / -1 

(b) Argue that K is a complete sufficient statistic for 6 . 

(c) Find the unbiased minimum variance estimator of 0. 

(d) Show that XJY and Y are independent. 

(e) What is the distribution of XJYV 
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More About 
Estimation 


8.1 Bayesian Estimation 

In Chapter 6 we introduced point and interval estimation for 
various parameters. In Chapter 7 we observed how such inferences 
should be based upon sufficient statistics for the parameters if they 
exist. In this chapter we introduce other concepts related to estimation 
and begin this by considering Bayesian estimates, which are also based 
upon sufficient statistics if the latter exist. 

In introducing the interesting and sometimes controversial 
Bayesian method of estimation, the student should constantly keep 
in mind that making statistical inferences from the data does not 
strictly follow a mathematical approach, tlearly, up to now, we have 
,had to construct models before we have been able to make such 
inferences. These models are subjective, and the resulting inference 
depends greatly on the model selected. For illustration, two statis¬ 
ticians could very well select different models for exactly the same 
situation and make different inferences with exactly the same data. 
Most statisticians would use some type of model diagnostics to see if 
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the models seem to be reasonable ones, but we must still recognize 
that there can be differences among statisticians' inferences. 

We shall now describe the Bayesian approach to the problem of 
estimation. This approach takes into account any prior knowledge of 
the experiment that the statistician has and it is one application of a 
principle of statistical inference that may be called Bayesian statistics. 
Consider a random variable X that has a distribution of probability 
that depends upon the symbol d, where d is an element of a well-defined 
setH. For example, if the symbol d is the mean of a normal distribution, 
Q maybe the real line. We have previously looked upon d as being some 
constant, although an unknown constant. Let us now introduce a 
random variable 0 that has a distribution of probability over the set 
fi; and, just as we look upon jc as a possible value of the random 
variable X, we now look upon 0 as a possible value of the random 
variable 0. Thus the distribution of X depends upon d, an experimental 
value of the random variable ©. We shall denote the p.d.f. of 0 by h(6) 
and we take h(8) = 0 when 6 is not an element of fl. Moreover, we now 
denote the p.d.f. of A" by f(x\0) since we think of it as a conditional p.d.f. 
of X, given & = d. 

Say X 2 ,..., X„ is a random sample from this conditional 
distribution of X. Thus we can write the joint conditional p.d.f. of 
X U X 2 ,..., X n , given 0 = 0, as 

f(x\\0)f(x 2 \6) - - f(x n \d). 

Thus the joint p.d.f. of X t , X 2 , …， X„ and © is 

^ 1 ,^ 2 , = Ax^Ax^O) - - ■ f(x n \d)h(d). 

If © is a random variable of the continuous type, the joint marginal 
p.d.f. of X 2 ,..., X„ is given by 

疒 00 

g\(x u X 2 , g(Xy,X 2 , • . • ， X„, 0) d0. 


If 0 is a random variable of the discrete type, integration would be 
replaced by summation. In either case the conditional p.d.f. of 0, given 




=x ]f ..., X„ = x„, is 

k(0\xi,x 2 . x „)= 




g(x,,x 2 , 

S \ (-^1 > -^ 2 » • • • 5 ^ n ) 

f(xAe)f(x 2 \e)--f(x„m(0) 

g\(Xi,X 2 , 


This relationship is another form of Bayes’ formula. 
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Example 1. Let X t , X 2 ,..., X H be a random sample from a Poisson 
distribution with mean 6, where 汐 is the observed value of a random variable 
0 having a gamma distribution with known parameters a and Thus 


咖， …， = 


d^e- 8 ’ 

~0 a -'e~ e,fi ~ 


__ r ⑻浐 


provide that x, = 0,1,2, 3,..., i = 1,2,... ,n and Q < 9 < co, and is equal 
to zero elsewhere. Then 


幻 (X 】 ，…， x„)= 


+ at — 1泛 一 (《 + 1// D ^ 

A! JC”! r(ot〆 0 


+ a 


jc,! • • • jc,! r(a)^(/i + 1/ 泠严 + * 

Finally, the conditional p.d.f. of 0, given X t = x t ,..., X„ = x„, is 

g(xi ， … ， x ,， 0) 


k(0\x y ,. ..,x„) 


gi(Xi, 




0LX, + a-i e -e/[fiH^+l)] 

咖,_+«>鮮+ f 

provided that Q < 0 < co, and is equal to zero elsewhere. This conditional 
p.d.f. is one of the gamma type with parameters a* = S x, + a and 
= 辦 + 1). 

In Example 1 it is extremely convenient to notice that it is not really 
necessary to determine ..., to find k{0\x x ,..., jc”). If we 
divide 


Ax x \e)f{x 2 \6) - ■ j\x n \d)h{6) 

byg|(x,,..., x,), we must get the product of a factor, which depends 
upon j ：|, ..., but does not depend upon 6, say c(^,,..., ^c„), and 

ffLxi + a-l e -9im»P+ I)] 

That is, 

KOlx^ ,..., = c(jc, ,..., + 

provided that 0 < 0 < oo and x-, = 0,1, 2,, /' = 1,2,..., n. 
However, c(x ]f ... ,x„) must be that “constant” needed to make 
k(6\X],... ,x„) a p.d.f., namely 


c(x t , 


J>+« [/W+ l)] lx,+ * 
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Accordingly, Bayesian statisticians frequently write that 
k(6\Xi, … ， jc” ）is proportional to 

g(Xi,X 2 , ... ,x„,d); 

that is, 

- ,x n ) ac/(ac||0) - - ■ f(x„\9)h(0). 

Note that in the right-hand member of this expression all factors 
involving constants and jc ， jc” alone (not d) can be dropped. For 
illustration, in solving the problem presented in Example 1 , the 
Bayesian statistician would simply write 

k(0\x„ ...,x n )oc 0 1 〜 - 呼 - y e~ m 

or, equivalently, 

k(d\x, ， .- .,x„)oc^ + a - / 柳 _ + ”] ， 

0 < 0 < oo and is equal to zero elsewhere. Clearly, k(d\x t ,... ,x„) 
must be a gamma p.d.f. with parameters a* = S jc, + a and 

P* = PM + 1 ). 

There is another observation that can be made at this point. 
Suppose that there exists a sufficient statistic Y = u(X^ ... ， X n ) for the 
parameter so that 

/(xtlO) - - = g[u(x ,,. .., x„)\6\H{x u . . ., x„), 

where now g(y\d) is the p.d.f. of Y, given & = 9. Then we note that 
k{e\x y ,.. ., oc g[u(x^ ..., 

because the factor H(x ] ,x„) that does not depend upon 0 can be 
dropped. Thus, if a sufficient statistic Y for the parameter exists, we can 
begin with the p.d.f. of Y if we wish and write 

k(^\y) qc 卿)， 

where now k(d\y) is the conditional p.d.f. of 0, given the sufficient 
statistic Y = y. The following discussion assumes that a sufficient 
statistic Y does exist; but more generally, we could replace Y by 
X U X 2 ,... ,X„ in what follows. Also, we now use to be the 
marginal p.d.f. of Y; that is, in the continuous case. 


犮 iO)= 


g {y\e)h{d) de. 
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In Bayesian statistics, the p.d.f. h{0) is called the prior p.d.f. 
of ©, and the conditional p.d.f. k{6\y) is called the posterior 
p.d.f. of®. This is because h{0) is the p.d.f. of© prior to the observation 
of 7, whereas k(6\y) is the p.d.f. of© after the observation of y has been 
made. In many instances, h(0) is not known; yet the choice of h(0) 
affects the p.d.f. k(0\y). In these instances the statistician takes into 
account all prior knowledge of the experiment and assigns the prior 
p.d.f. h(0). This, of course, injects the problem of personal or subjective 
probability (see the Remark, Section 1.1). 

Suppose that we want a point estimate of 9. From the Bayesian 
viewpoint, this really amounts to selecting a decision function S, so that 
is a predicted value of 0 (an experimental value of the random 
variable ©) when both the computed value 少 and the conditional p.d.f. 
k(6\y) are known. Now, in general, how would we predict an 
experimental value of any random variable, say W, if we want our 
prediction to be “reasonably close” to the value to be observed? Many 
statisticians would predict the mean, E{W) y of the distribution of W\ 
others would predict a median (perhaps unique) of the distribution of 
W\ some would predict a mode (perhaps unique) of the distribution of 
W; and some would have other predictions. However, it seems 
desirable that the choice of the.decision function should depend upon 
the loss function ^[9, <5(j)]. One way in which this dependence upon 
the loss function can be reflected is to select the decision function S in 
such a way that the conditional expectation of the loss is a minimum. 
A Bayes’ solution is a decision function 3 that minimizes 


Eme, S(y)W Y = y }= 雄 S(y)]k(0\y) d0 f 


if © is a random variable of the continuous type. The usual 
modification of the right-hand member of this equation is made for 
random variables of the discrete type. If, for example, the loss function 
is given by 占 ( 少 )] =[0 — ^(y)] 2 , the Bayes’ solution is given by 
S(y) ― £(©|y)，the mean of the conditional distribution of @, given 
Y = y. This follows from the fact that E[{W — 6) 2 ], if it exists, is 
a minimum when b — E{W). If the loss function is given by 
S^[0, «5( 少 )] =\0 — 6( 少 )|, then a median of the conditional distribution 
of ©, given F = is the Bayes’ solution. This follows from the fact 
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that E{\W — ft|), if it exists, is a minimum when b is equal to any median 
of the distribution of W. 


The conditional expectation of the loss, given Y = y, defines a 
random variable that is a function of the statistic Y. The expected value 
of that function of Y, in the notation of this section, is given by 


S(y)]k(6\y) dd \g^y)dy 




m S(yMy\6) dy\h(6) cW, 


in the continuous case. The integral within the braces in the latter 
expression is, for every given 0eQ, the risk function R(6, ^); 
accordingly, the latter expression is the mean value of the risk, or the 
expected risk. Because a Bayes’ solution minimizes 

广 oo 

ne, de 


for every ^ for which g, ( 少 ） > 0, it is evident that a Bayes’ solution 占 ( 少 ) 
minimizes this mean value of the risk. We now give an illustrative 
example. 

Example 2. Let X 2 ,... ,X n denote a random sample from a 
distribution that is b(\, 6),0<6<l. We seek a decision function 6 that is a 

n 

Bayes’ solution. The sufficient statistic y =[ 不 ， and y is b(n, 6). That is, the 

i 

conditional p.d.f. of Y, given © = 沒 ， is 

g(y\e) = 0(1 一 ey-y, y = o,\,...,n, 


— 0 elsewhere. 

We take the prior p.d.f. of the random variable @ to be 

r(a + 灼 


m 


mm 

0 elsewhere 


- ey-\ o<0< i. 


where a. and P are assigned positive constants. Thus the conditional p.d.f. of 
0, given Y = y, is, at points of positive probability density, 

k(e\y) oc ^(i - - ey-', o<e<\. 
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That is. 


my) 


r(n + a + P) 


r(a + jv)r(« + p-y) 


铲 + 少 - 1(1 _ ey 


0 < 0 < 1, 


and 少 = 0, 1， • •. ， n. We take the loss function to be if[0, 占 ( 少 ) ]=[6 — 占 Cv)] 2 . 
Because y is a random variable of the discrete type, whereas © is of the 
continuous type, we have for the expected risk, 




t [e-6(y)f(^jey(i-er-y[h(e)de 


1 


[e-d(y)n(e\y)de} gl (y). 


The Bayes’ solution 占 ( 少 ) is the mean of the conditional distribution of ©, given 
Y ^ y. Thus 

d(y)= 0k(0\y) d6 

^0 


r(/i + a + P) 「 
r(ot + 少 ) r(« + 彡 - 少 ) 丄 


俨 +>(i -dy +n -^-'d$ 


a + y 
<x + P + n 


This decision function 占 ( 少 ） minimizes 

\0- S(y)] 2 my) dB 

for_v = 0, 1,..., /I and, accordingly, it minimizes the expected risk. It is very 
instructive to note that this Bayes* solution can be written as 


3(y) 


n 




+ 


a + P 


a. 


、a + P + nJn + p + nj a + P 
which is a weighted average of the maximum likelihood estimate y/n of 0 and 
the mean a/(a + P) of the prior p.d.f. of the parameter. Moreover, the 
respective weights are n/(a + P + n) and (a + P)/(<x + P + n). Thus we see that 
oc and 办 should be selected so that not only is a/(a + P) the desired prior mean, 
but the sum a + P indicates the worth of the prior opinion, relative to a sample 
of size n. That is, if we want our prior opinion to have as much weight as a 
sample size of 20, we would take a + ^ = 20. So if our prior mean is we have 


that a and are selected so that a = 15 and P = 5. 

Example 3. Suppose that Y = X, the sufficient statistic, is the mean of a 
random sample of size n that arises from the normal distribution N{6, a 2 ), 
where a 2 is known. Then g ( 少 j0) is N(8, o^/n). Further suppose that we 
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are able to assign prior knowledge to 9 through a prior p.d.f. h(6) that is 
N(9 0 , ff^). Then we have that 


k(9\y) cc 




exp 


(y - B ) 2 (e - e 0 f 


2(ff» 2al 


If we eliminate all constant factors (including factors involving y only), we 
have 


oc exp 


(4 + ffVw ) 屮一 2(>><yg + 
2(ff 2 /n)al 


This can be simplified, by completing the square, to read (after eliminating 
factors not involving 9) 


A^(0|_v) oc exp 


i 9 - 


4 + 》 


2(<r 2 /n)al 


(4 + tr 2 /”） 

That is, the posterior p.d.f. of the parameter is obviously normal with mean 
yol+ Oo^/n ( al \ _ f 


ol + c 2 /n \ol + a^n 


y 




0 o 


and variance ( 沪 /«)<^/(<^ + <T 2 jn). If the square-error loss function is used, this 
posterior mean is the Bayes’ solution. Again, note that it is a weighted average 
of the maximum likelihood estimate y = x and the prior mean . Observe here 
and in Example 2 that the Bayes’ solution gets closer to the maximum 
likelihood estimate as n increases. Thus the Bayesian procedures permit the 
decision maker to enter his or her prior opinions into the solution in a very 
formal way such that the influences of these prior notions will be less and less 
as n increases. 

In Bayesian statistics all the information is contained in the 
posterior p.d.f. k(d\y). In Examples 2 and 3 we found Bayesian point 
estimates using the square-error loss function. It should be noted that 
if 6] = l^) — 6\, the absolute value of the error, then the 

Bayes’ solution would be the median of the posterior distribution of 
the parameter, which is given by k{Q\y). Hence the Bayes’ solution 
changes, as it should, with different loss functions. 

If an interval estimate of S is desired, we can now find two functions 
t^y) and y ( 少 ） so that the conditional probability 

^v(y) 

Pr [u(y) <©< y = y]= k(0ly) d0, 
is large, say 0.95. The experimental values of H … ， X„, say 
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X],x 2 ,.. ^, x„, provide us with an experimental value of Y, say y. Then 
the interval u(y) to y( j) is an interval estimate of 0 in the sense that the 
conditional probability of @ belonging to that interval is equal to 0.95. 
For illustration, in Example 3 where the posterior p.d.f. of the 
parameter was normal, the interval, whose end points are found by 
taking the mean of that distribution and adding and subtracting 1.96 
of its standard deviation, 

y<J \ +e ^ ln ±L96 

al + a /n V <Tq + a jn 

serves as an interval estimate for d with posterior probability of 0.95. 
EXERCISES 

8.1. Let A",, , A^bea random sample from a distribution that isZ>(l, 6). 

Let the prior p.d.f. of 0 be a beta one with parameters a and p. Show that 
the posterior p.d.f. k(6\x { ， x 2 ， … ， x”）is exactly the same as k(6\y) given in 
Example 2. 

8.2. Let X 2t ..., X„ denote a random sample from a distribution that is 
N(6, a 2 ) t — oo < 0 < oo, where tr 2 is a given positive number. Let Y = X, 
the mean of the random sample. Take the loss function to be 
邪 ， %)] = 10 — 3( 少 )|. If 0 is an observed value of the random variable © 
that is N(fi, t 2 ), where t 2 > 0 and p are known numbers, find the Bayes’ 
solution for a point estimate of 9. 

8.3. Let X 2i .. ., denote a random sample from a Poisson distribution 
with mean 6, 0 < 6 < co. Let Y = and take the loss function to be 

办 I 

义 [0 ， 古 ( 少 )] =[0 — Let 6 be an observed value of the random variable 
©. If 0 has the p.d.f. h(6) = ff , ~ ] e~ 9ip /r(a)P a , 0 <6 < oo, zero elsewhere, 
where a > 0, ^ > 0 are known numbers, find the Bayes’ solution 6( 少 ） for 
a point estimate of 6. 

8.4. Let V„ be the nth order statistic of a random sample of size n from a 
distribution with p.d.f. f(x\0) = 1/9, 0 < x < 6, zero elsewhere. Take the 
loss function to be ^[9, 5( 少 ”)] =[6 — <5(^„)] 2 . Let 0 be an observed value of 
the random variable 0, which has p.d.f. h(6) = ^<x p IO p + a < 0 < oo, zero 
elsewhere, with a > 0, P > 0. Find the Bayes’ solution for a point 
estimate of 6. 

8.5. Let Y, and Y 2 be statistics that have a trinomial distribution with 
parameters n,6 t , and 6 2 . Here 0, and d 2 are observed values of the random 
variables 0, and © 2 , which have a Dirichlet distribution with known 
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parameters a,, a 2 , and a 3 (see Example 1, Section 4.5). Show that the 
conditional distribution of ©, and 0 2 is Dirichlet and determine the 
conditional means £(©||y,,^ 2 ) and £(© 2 |y,,^ 2 )- 

8.6. Let X be iV(0, 1/0). Assume that the unknown 0 is a value of a random 
variable © which has a gamma distribution with parameters a = r/2 and 
P = 2/r, where r is a positive integer. Show that X has a marginal 
/-distribution with r degrees of freedom. This procedure is called 
compounding, and it maybe used by a Bayesian statistician as a way of first 
presenting the /-distribution, as well as other distributions. 

8.7. Let X have a Poisson distribution with parameter 6. Assume that the 
unknown disa value of a random variable 0 that has a gamma distribution 
with parameters a = r and 释 =(l — p)/p, where r is a positive integer and 
0 < p < 1. Show, by the procedure of compounding, that A" has a marginal 
distribution which is negative binomial, a distribution that was introduced 
earlier (Section 3.1) under very different assumptions. 

8.8. In Example 2 let n = 30, a = 10, and ^ = 5 so that = (10 + y)/45 is 
the Bayes’ estimate of 9. 

(a) If Y has the binomial distribution 6(30, 6), compute the risk 
E{[6 - S(Y)} 2 }- 

(b) Determine those values of 6 for which the risk of part (a) is less than 
0(1 — 0)/30, the risk associated with the maximum likelihood estimator 
Y/n of Q. 

8.9. Let y 4 be the largest order statistic of a sample of size n = 4 from a 
distribution with uniform p.d.f./(x; 9) = 1/6,0 < x < 6, zero elsewhere. If 
the prior p.d.f. of the parameter is g(6) = 2/ 炉 ， 1 < d < ao t zero elsewhere, 
find the Bayesian estimator <$( y 4 ) of 6, based upon the sufficient statistic Y 4 , 
using the loss function |<5( 少 4 ) 一 0|. 

8.10. Consider a random sample X it X 2 , ..., X„ from the tVeibull distribution 
with p.d.f. f(x; 9, t) = 9tx x ~ X e~ 6x \ 0 < x < oo, where 0 < 0, 0 < t, zero 
elsewhere. 

(a) If t is known, find the m.l.e. of 6. 

(b) If the parameter 6 has a prior gamma p.d.f. g{&) with parameters a and 

P* = 1 /fi, show that the compound distribution is a Burr type with p.d.f. 
h(x) = atp 1 x x ~ ”(x l + /?) a+ 0 < x < oo, zero elsewhere. 

(c) If, in the Burr distribution, r and are known, find the m.l.e. of a based 
on a random sample of size n. 

8.2 Fisher Information and the Rao-Cramer Inequality 


Let I be a random variable with p.d.f. f(x; 0), OeSl, where the 
parameter space Q is an interval. We consider only special cases. 
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sometimes called regular cases, of probability density functions as we 
wish to differentiate under an integral (summation) sign. In particular, 
this means that the parameter 0 does not appear in endpoints of the 
interval in which /(x; 6) > 0. 

With these assumptions, we have (in the continuous case, but the 
discrete case can be handled in a similar manner) that 


f{x\ 0) dx 


and, by taking the derivative with respect to 9, 

D 0) 


ee 


dx = 0. 


( 1 ) 


The latter expression can be rewritten as 

Sf(x; 0) 


de 


or, equivalently, 


^ Ax-, 0) 

■oo 

d\nf(x; 0) 


f(x; 6) dx = 0 


f{x\ 0) dx = 0. 


If we differentiate again, it follows that 

「師;〜命 ㈣ 卵 ㈣ ― 


de 2 


dd 


de 


dx = 0. (2) 


We rewrite the second term of the left-hand member of this equation 
as 


d/(x-, 6) 



d \nf(x; d) 


de 

fix； e) 


f(x; 6) dx = 


AGO 

Infix; d)~ 

—QQ 

_ de _ 


f{x\ 0) dx. 


This is called Fisher information and is denoted by 1(6). That is, 

~d 6) 


m 


/ *00 「 • 

c 

^ — rr\ ~■ 


ee 


f{x\ 6) dx-. 


but, from Equation (2)，we see that 1(8) can be computed from 

d 2 ln/(x; 6) 


m = - 


ee 2 


-f(x; 0) dx. 
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Sometimes, one expression is easier to compute than the other, but 
often we prefer the second expression. 

Remark. Note that the information is the ^weighted mean of either 

~ d In 槪 0) T d 2 \nAx ； 9) 

d9 ° r ， 

where the weights are given by the p.d.f ： J{x\ 0). That is, the greater these 
derivatives on the average, the more information that we get about 9. Clearly, 
if they were equal to zero [so that 9 would not be in In ./(x; 0)], there would 
be zero information about 9. As we study more and more statistics, we learn 
to recognize that the function 

d In j{x\ 6) 

~~ d9 ~ 

is a very important one. For example, it played a major role in finding the m.l.e. 
by solving 

A 01nM; ^)_ Q 

h ~~ de ^ = 
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because each is very easy. Of couree, the information is greater with smaller 
values of a 2 . 


Example 2. Let X be binomial ^1, 9). Thus 

--In f{x\ 卿 ， jc In 沒 + (1 — x) In (1 — 6), 
乃 ln^1[x;0) x 1 — x 
—~ dd ~~ 


and 


Clearly, 


d 2 In f(x; 9) x 1 — x 

—~ W 2 ^ = ~~e 2 ~ (l - 0) 2 


m 


E 


X 


d 2 




l l. l 

~ ff 2 0 -e) 2 ~e \-e~ 90-9) 

which is larger for 6 values dose to zero or 1. 


Suppose that X 2 ,..., X„ is a random sample from 祆 
distribution having p.d.f./(x; 0). Thus the likelihood function (the joint 
p.d.f. of X 2 y .. ., X„) is 

L(0) =/(x,; 0)f(x 2 -, 9) - - f(x„; 9). 

Of course. 


In L{0) = In f{x x \ 0) -!- In /(^ 2 ； 0) + … + In f(x„; 9) 


and 


d In L{9) d In f(x { ; 6) d In f{x 2 \ 0) 


+ 




+ (3) 


do ee ee ee 

It seems reasonable to define the Fisher information in the random 
sample as 

^ In L(0) 


IM = E 


d0 


Note if we square Equation (3), we obtain cross-product terms like 

飞 In 肌 佩 （ 

d0 ee 


2E 


i 孕 j. 
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which from the independence of and Xj equals 

ln /(^,； 0)1 ^[d 

_ ee 」 [_ 洲 _ 

The fact that this product equals zero follows immediately from 
Equation (1). Hence we have the result that 

機 

However, each term of this summation equals 1(6), and hence 

W) = nl(0). 

That is, the Fisher information in a random sample of size «is «times 
the Fisher information in one observation. So, in the two examples of 
this section, the Fisher information in a random sample of size rt is w/tr 2 
in Example 1 and n/[6(\ — 0)] in Example 2. 

We can now prove a very important inequality involving the 
variance of an estimator, say Y = u(X lf X 2 ,..., X„), of 0 y which can 
be biased. Suppose that 

E{Y) = E\u{X u X 2 ,...,X n )] = m. 

That is, in the continuous case, 

*oo AQD 

k{6 )= … m(^i ,..., x„)f(x' ,6)' f(x„;9) dx x -' dx„; 


k\0) 


u^Xf , jcj, * < < j 


i d/(xr, ey 

f(Xi ； 9) d9 


X Ax,; 0).. 9) dxr . • dx„ 

r r , j^d\nf( Xi -e)-] 

= … u(x i ,x 2 ,... ,x„) 2 . —- 

x/U ,; 0) … f(x„; 0) dxr - dx„. ⑷ 

Define the random variable Z by. Z = E f(X,; 0)/60]. In accord- 

I 

ance with Equation (1) we have E(Z) = E\d In f(X,; 6)/66] = 0. 

I 

Moreover, Z is the sum of n independent random variables each with 
mean zero and consequently with variance E{[d In f(X; 6)/d6] 2 }. Hence 
the variance of Z is the sum of the n variances, 

= ，"⑼ = _) 


I n (0) = nl(0). 
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Because Y = u(X ]f ..., X„) and Z = ^ [5 In f{X s \ ff)/dd]. Equation (4) 

i I 


shows that E( YZ) = k'{&). Recall that 

E(YZ) = E(Y)E(Z) + pa Y a z , 

where p is the correlation coefficient of Y and Z. Since E(Y) = k(9) and 
E(Z) = 0, we have 


k\6) — k(0) - 0 + pa Y a 2 

Now p 2 < 1. Hence 


or 


im 


[mv 

oWz 


< 


or 


If we replace by its value, we have 


imr 


< <Ty. 


[km 2 


~^d\nf(X ； e)J 


[mf 

nm 


This inequality is known as the Rao-Cramer inequality. 

If y = «(^|, 不， ... ， X„) is an unbiased estimator of 9, so that 
k(0) = 6, then the Rao-Cramer inequality becomes, since k\0) = 1, 


<Ty > 


nl(0) 


Note that in Examples 1 and 2 of this section \jnl{9) equals a 2 Jn and 
0(1 — 0)/n, respectively. In each case, the unbiased estimator, X, of 6, 
which is based upon the sufficient statistic for d, has a variance that is 
equal to this Rao-Cramer lower bound of 1 /nl(6). 

We now make the following definitions. 

Definition 1. Let Y be an unbiased estimator of a parameter 0 in 
such a case of point estimation. The statistic Y is called an efficient 
estimator of 0 if and only if the variance of Y attains the Rao-Cram6r 
lower bound. 

Definition 2. In cases in which we can differentiate with resprot to 
a parameter under an integral or summation symbol, the ratio of the 
Rao-Cramer lower bound to the actual Variance of any unbiased 
estimation of a parameter is called the efficiency of that statistic. 


Example 3. LetT,, ... ,X„ denote a random sample from a Poisson 
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distribution that has the mean 0 > 0. It is known that X is an m.l.e. of 0; 
we shall show that it is also an efficient estimator of 6. We have 


d In /(jc; 6) 

^ ee 


Accordingly, 


E 


~~ee ^, 


d 

de 

X 

y 


(x In 0 — 0 — In x!) 

e 


1 


x 


d 


E(x - ey 

— ^ — 


(T 2 

¥ 


e 

¥ 


e 


The Rao-Cramer lower bound in this case is l/[n(l/6)] = d/n. But 6/n is the 
variance of X. Hence X is an efficient estimator of 6. 


Example 4. Let S 1 denote the variance of a random sample of size n > 1 
from a distribution that is N(jx, 0),0 < 0 < ao, where /i is known. We know 
that E^nS^/in — 1)] = 0. What is the efficiency of the estimator nS^/(n — 1)? We 
have 


and 


Accordingly, 


\nf(x\ 6) = 

(X - /i) 2 

2d 

In (2n0) 
2 ’ 

d\hf(x ； e) 

(X- 

⑷ 2 

I 


dd — 

2 炉 

^__ « 

26' 

d 2 In /(x; 6) 

(X - ti) 2 

1 

ee 2 — 


e 3 

+ 2 伊 ’ 

—p 

[dHnAX-e)-] 

e 

1 1 


卵 2 


a 3 

~2d 2 ^2^ 


Thus the Rao-Cramer lower bound is 20 2 /n. Now nS^/d is x 2 (n — 1), so the 
variance of nS^/O is 2(« — 1). Accordingly, the Variance of nS^/in ， 1) is 
2(« - 1 麟 (” -l) 2 ]= : 2d 2 /(n — 1). Thus the efficiency of the estimator 
nS^/in — 1) is (» — 1 )/n. With jw known, what is the efficient estimator of the 
variance? 


Examples. Let X t , X z ,.. ■, X„ denotea random sample of size n > 2 from 
a distribution with p.d.f. 

f{x\ 8) = dx° ~ 1 = exp (d in x ， ln.x + In 设)， 0 < x < 1 ， 

= 0 elsewhere. 

It is easy to verify that the Rao-Cramer lower bound is B^/n. Let 
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Y,— ^\n X h We shall indicate that each Y t has a gamma distribution. The 
associated transform y f = —In x h with inverse x, = e^ ri , is one-to-one and 
the transformation maps the space {x<: 0 < x, < 1} onto the space 
{yi :0 < y t < oo}. We have \J\ = e~ y, . Thus Y t has a gamma distribution with 

n 

a = 1 and ^ = 1/6. Let Z = — ^ In X ( . Then Z has a gamma distribution with 

' l :, ；i 

a = n and P = I/O. Accordingly, we have E(Z) = = rt/d. This suggests that 

we compute the expectation of 1 /Z to see if we can find an unbiased estimator 
of B. A simple integration shows that £(1/Z) 1). Hence (« — 1)/Z is 

an unbiased estimator of 6. With n> 2, the variance of (n — 1)/Z exists and 
is found to be 0 2 /(n — 2 ), so that the efficiency of (n — l)/Z is (n — 2)/n. This 
efficiency tends to 1 as n increases. In such an instance, the estimator is said 
to be asymptotically efficient. 

The concept of joint efficient estimators of several parameters has 
been developed along with the associated concept of joint efficiency of 
several estimators. But limitations of space prevent their inclusion in 
this book. 


EXERCISES 


8.11. Prove that X, the mean of a random sample of size n from a 
distribution that is N(0, ff 2 ), — oo <6 < oo, is, for every known a 1 > 0, an 
efficient estimator of 9. 


8.12. Show that the mean 无 of a random sample of sizen from a distribution 
which is 6(1, 0), 0 < 0 < 1, is an efficient estimator of 0. 

8.13. Given f(x; 6) = \jd, 0 < x < 9, zero elsewhere, with 0 > 0, formally 
compute the reciprocal of 


nE 


8 \nf{X\ ey 


Compare this with the variance of (n + \)YJn, where Y n is the largest item 
of a random sample of size n from this distribution. Comment. 


8.14. Given the p.d.f. 


Ax ； e) 


l 


. OO < 义 < 00, 


• QO < 0 < OO. 


7t[l +(X- df\ ' 

Show that the Rao-Cramer lower bound is 2/n, where n is the size of a 
random sample from this Cauchy distribution. 


8.15. Let X have a gamma distribution with a = 4 and ^ = 6 >0. 
(a) Find the Fisher information 1(6). 



380 


More Abtmt EstimatioH [Ch. 8 


(b) If A"], , A^is a random sample from this distribution, show that 

the m-l.e. of d is an efficient estimator of 6. 

8.16. Let AT be A^(0,»), 0 < d < oo. 

⑻ Find the Fisher information 1(0). 

(b) If X\,X 2 , ..., random sample from this distribution, show that 

the m.l.e. of 0 is an efficient estimator of 0. 


8.3 Limiting Distributions of Maximum Likelihood Estimators 

We use the notation and assumptions of Section 8.2 as much as 
possible here. In particular, f(x; 9) is the p.d.f., 1(0) is the Fisher 
information, and the likelihood function is 

L{6) =f(x x ; 0)f(x 2 ; 0) - - f(x„; 6). 

Also, we can differentiate under the integral (summation) sign, so that 

^ d In m " d lnf(X i； 6) 

~ ~ dd ~ = A Te 

has mean zero and variance nl(6). In addition, we want to be able to 
find the maximum likelihood estimator § by solving 

d[\n L($)] n 

~w~ =0 - 


That is. 


a[ln L(&)] 
~ d6 


= 0 ’ 


where now, with ^ in this expression, UjS) ' */(^； ^)- 

We can approximate the left-hand member of this latter equation by 
a linear function found from the first two terms of a Taylor’s series 
expanded about 9, namely 


叩 n L(9)] 
86 


+ 0 - 0 ) 


d 2 [\n Li9)] 

~~ W 





when L(6) =/( 不； d)f\X 2 , 6) - - f(X n ; 6). 

Obviously, this approximation is good enough only if 9 is dose to 
6, and an adequate mathematical proof involves certain regularity 
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conditions, all of which we have not given here. But a heuristic 
argument can 1^ made by solving for ^ — 0 to obtain 


d[ln L(0)] 
dO 

d 2 \\n U9)\ 

ee 2 ~ 


z 

d 2 [\n L(6 )]' 


ee 2 


Let us rewrite this equation as 



Z/JnW) 


1 d 2 \\n L(0)] 
n ^ W 2 ~~ 



( 1 ) 


Since Z is the sum of the i.i.d. random variables 


din fjX,; 0) 
d9 



each with mean zero and variance 1(0), the numerator of the right-hand 
member of Equation (1) is limiting N(0, 1) by the central limit theorem. 
Moreover, the mean 

1 A -d 2 \nf(X t ;0) 
d0 2 


converges in probability to its expected value, namely 1(9). So the 
denominator of the right-hand member of Equation (1) converges in 
probability 1. Thus, by Slutsky’s theorem given in Section 5.5, the 
right-hand member of Equation (1) is limiting 7V(0,1). Hence the 
left-hand member also has this limiting standard normal distribution. 
That means that we can say that § has an approximate normal 
distribution with mean 0 and variance \/nI(0). 

The preceding result means that in a regular case of estimation and 
in some limiting sense, the m.l.e. 沒 is unbiased and its variance achieves 
the Rao-Cramer lower bound. That is, the m.l.e. ^ is asymptotically 
efficient. 

Example 1. In Exercise 8.14 we examined the Rao~Crame 『 lower bound 
of the variance of an unbiased estimator of 6, the median of a certain Cauchy 
distribution. We now know that the m.l.e. 沒 of 0 has an approximate normal 
distribution with mean 0 and variance equal to the lowe 『 bound of2/». Hence, 
once we compute §, we can say, for illustration, that 沒 ± 1.96^/2/n provides 
an approximate 95 percent confidence interval for B. 
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To determine there are many numerical methods that can be 
used. In the Cauchy case, one of the easiest is given by the following: 

ainL(g)_ - 2(x 「 e) 

d6 ，•- 1 1 + (Xj — d) 2 

In the denominator of the right-hand member, we use a preliminary 
estimate of 6 that is not influenced too much by extreme observations. 
For illustration，the sample median, say 沒 。， is very good one while the 
sample mean 3c would be a poor choice. This provides weights 


w “ = l+( 二， 

so that we can solve 


i = 1, 2,..., 


0 = X (w n )(Xi - d) to get „ . 

Now 沒 , can be used to. obtain new weights and 沒 2 : 

2 A 

w n = , . / > 权 2 = "T ； . 

1 + (Xi — Oi) X VV,2 

This iterative process can be continued until adequate convergence is 
obtained; that is, at some step k, & k will be close enough to ^ to be used 
as the m.l.e. 

Example 2. Suppose that the random sample arises from a distribution 
with p.d.f. 

/(jc; 9) = Ox 6 ~0 < x < 1, 0 •* {0:0 < 0 < oo}, 

zero elsewhere. We have 




In f(x; 0) = In 0 + (0 — 1) In x. 


d ln^x; d) l , 

—ee—^e + lnx ^ 


and 


d 2 \nf(x; d) 

~" W 2 ~ 


9 2 


Since E(—l/9 2 ) — 一 1/ 炉 ， the lower bound of thft variance of every unbiased 
estimator of 0 is 9 2 /rt. Moreover, the maximum likelihood estimator 
^ = . ， n/ln Il"_, Xi has an approximate normal distribution with mean 9 and 
variance ^/n. Thus, in a limiting sense, ^ is the unbiased minimum variance 
estimator of 9; that is, 0 is asymptotically efficient. 
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Example 3. The m.l.e. for 6 in 


fix ； d) 


x\ 


x = 0,l,2. 


6eQ.= {6:0<9< oo}, 


is 


X, the mean of a random sample. Now 

In j\x\ 0) = In 0 — 0 — In jc! 


and 


Thus 


d[\nf(x- 6)] x 

de = e 


and 


咖你⑼ — 」 

= ~I 2 ' 


■E\ 



e i 

~e 2 ) 

- nz 一 

e 2 e 


and S = i has an approximate normal distribution with mean d and standard 
deviation y/d/n. That is, Y = (X — &)jy/djn has a limiting standard normal 
distribution. The problem in practice is how best to estimate the standard 
deviation in the denominator of Y. Clearly, we might use X for 0 there, but 
does that create too much dependence between the numerator and 
denominator? If so, this requires a very large sample size for (P — 6)/ s /X/n 
to have an approximate normal distribution. It might be better to approximate 
/(0) by 


Thus nl(6) is approximated by n^/x 2 and we can say that 

x/s 

is approximately N(0, 1). We do not know exactly which of these two 
solutions, or others like simply using s/y/n in the denominator, is best. 
Fortunately, however, if the Poisson model is correct, usually 



s 






If this is not true, we should check the Poisson assumption, which requires, 
among other things, that n = a 2 . Hence, for illustration, either 


x 土 1.96 



or 



or 


.v + 


1.96j 


serves as an approximate 95 percent confidence interval for 0. In situations 
like this, we recommend that a person try all three because they should be in 
substantial agreement. If not, check the Poisson assumption. 
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The fact that the m.l.e. 6 has an approximate normal distribution 
with mean 0 and variance l/nl(d) suggests that 6 (really a sequence 
m . . • ， 艮， • ..）converges in probability to 6 . Of course, & n can 
be biased; say E0” — 0) = b„(0), where b„(0) is the bias. However, b„(d) 
equals zero in the limit. Moreover, if we assume that the variances exist 
and 

lim [var (&)] = lim , 

w-*oo «-*ao rllyy) 

then the limit of the variances is obviously zero. Hence, from 
Chebyshev’s inequality, we have 

Pr [^ - 01 > 6] < ^" 7 ^ . 

However, 

lim E[0„ - 0f\ = lim [b 2 M + var ( 見 ) ]-0 

n-*co »-*co 

and thus 

lim Pr [|^„ - 0| ^ c] = 0 

n-*ao 

for each fixed c > 0. Any estimator, not just maximum likelihood 
estimators, that enjoys this property is said to be a consistent estimator 
of 8. As illustrations, we note that all the unbiased estimators based 
upon the complete su 伍 cient statistics in Chapter 7 and all the 
estimators in Sections 8.1 and 8.2 are consistent ones. 

We close this section by considering the extension of these limiting 
distributions to maximum likelihood estimators of two or more 
parameters. For convenience, we restrict ourselves to the regular case 
involving two parameters, but the extension to more than two is 
obvious once the reader understands multivariate normal distributions 
(Section 4.10). 

Suppose that the random sample X,, X 2 ,..., X„ arises from a 
distribution with p.d.f. J{x; , d 2 ), (0|, d 2 ) e Q, in which regularity 
conditions exist. Without describing these conditions in any detail, let 
us simply say that the space of X where f{x\ Q x ， 0 2 ) > 0 does not 
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involve 0i and 0 2 , and we are able to differentiate under the integral 
(summation) signs. The information matrix of the sample is equal to 

I„ = X 






dd. 


E 


E 


pin 凡！ T; 札 0 2 pin/(U h _ 


dB, 


ddy 


\d\nf(x；d l> e 2 )d\nf(x；d l ,d 2 y i 

~de, W 2 

[d\nf{x；d u e 2 ) 

de. 


E 

rd 2 \nf(x ； e lt e 2 )i 

E 

rd l \nf{X]e [ ,0 1 f 


„ 辦 一 

_ 洲 i 卵 2 


F 

rd 2 \nf(X;d u 0 2 )~\ 

E 



V. 

_ de { dB 1 _ 

_ del _ 



One can immediately see the similarity of this to the one-parameter 


case. 

If 彡 , and are maximum likelihood estimators of 0, and 0 2 , then 
沒 , and have an approximate bivariate normal distribution with 
means and 0 2 and variance-covariance matrix I~'. That is, the 
approximate variances and covariances are found, respectively, in the 
matrix 


( var(^,) cov {0 U ^ 2 )\ 
、 cov(J H 色 ） var (^ 2 ))' 


An illustration will help us understand this result that has simply been 
given to the reader to accept without any mathematical derivation. 

Example 4. Let the random sample X^X 2 , …， arise from N(6 U 0 2 ). 
Then 

1 (jc — ) 2 

In/(x; 0,,0 2 ) = -^ln (2n9 2 ) -—— , 

d In f(x; 9 u 9 2 ) x — 0, 

de t e 2 ' 

s ln/(x; g|,g 2 ) _ __j_ (X - 0|) 2 

W 2 = ~26 2 + 2B \ ， 
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d 2 ln/<x; e u d 2 ) 
d0\~~ — 
d 2 In fjx ； 0 i, 0 2 ) 
dd t dd 2 


-t 

T ， 

—(x — fl|) 


d 2 ln/t-y ； Oi, Oi) 一丄二 ( x - ff i) 2 
aej = 2^ 2 ^~. 

If we take the expected value of these three second partial derivatives and 
multiply by — n, we obtain the information matrix of the sample, namely. 




; 0 


e 2 


n 

2^. 


Hence the approximate variance-covariance matrix of the maximum 
likelihood estimators & t = X and 沒 2 = 5 s is 




n 


0 


0 ^ 
n 

It is not surprising that the covariance equals zero as we know that X and 
S 2 are independent. In addition，we know that 


va: 


rW 


and 


var (5 1 ) = var 


.®(f ： 


rt 


? var 


(?) 


2(/i 

_ - — 


since nS ^/62 is x\n — 1). While var (5 s ) ^ 26^/rt, it is true that 


20j 一 2 (/» - l)Bj 


n 




for large n. 


EXERCISES 


8.17. Let Jf,, Jf 2 *.. •, JC. be a random sample from each of the following 
distributions. In each case, find the m.l.e. var (&), l/nl(0), where 1(0) is 
the Fisher information of a single observation X, and compare var {&) and 
\/nI(0). 

(a) b{l,0),O<e^l. 
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(b) N(9, 1), — ao < 0 < ao. 

(c) ^V(0, 6), 0 < 0 < oo. 

(d) Gamma (a = 5, J? = 0), 0 < 0 < oo. 

8.18. Referring to Exercise 8.17 and using the fact that 沒 has an approximate 
N[9, l/n/(0)], in eabh case construct an approximate 95 percent confidence 
interval for 9. 

8.19. Let (A",, F|), (X 2 , Y 2 \ .... (.X n , Y„) be a random sample from a 

bivariate normal distribution with unknown means and d 2 and with 
known variances and correlation coefficient, and p, respectively. Find 

the maximum likelihood estimators and d, and 0 2 and their approxi¬ 
mate variance-covariance matrix. In this case, does the latter provide the 
exact variances and covariance? 

8.20. Let (Jf|, Y } ), (X 2 , Y 2 ),..., (X„, Y n ) be a random sample from a 
bivariate normal distribution with means equal to zero and variances 6>, and 
0 2 , respectively, and koown correlation coefficient p. Find the maximum 
likelihood estimators and of d, and d 2 and their approximate 
variance-covariance matrix. 


8.4 Robust Af-Estimation 

In Example 1 of Section 8.3 we found the m.l.e. of the center 6 of 
the Cauchy distribution with p.d.f. 

where — oo < 0 < oo. The logarithm of the likelihood function of a 
random sample X t , X 2 ,..., X„ from this distribution is 

In L(6) = -«In rc - f In [1 + (jc, - d) 2 ]. 

/ ■* 1 

To maximize, we differentiated In L(d) to obtain 
d\nL(6)_ - 2(x f — 0) 
dd i = i 1 -h (^/ 一 B) 2 

The solution of this equation cannot be found in closed form, but the 
equation can be solved by some iterative process. There, to do this, we 
used the weight function 

咖-⑹ 、作 2 省， 
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where 沒 。 is some preliminary estimator of 0 , like the sample median. 
Note that values of jc for which |jc _ 沒 。| is relatively large do not have 
much weight. That is, in finding the maximum likelihood estimator of 
0 , the outlying values are downweighted greatly. 

The generalization of this special case is described as follows. Let 
X u X 2 ,..., X„be a random sample from a distribution with a p.d.f. 
of the form f(x — d), where 0 is a location parameter such that 
— oo < 0 < oo. Thus 


In m = 5 ； ln/(^ 


where = —In f(x), and 


dlnm 


dO 


Ax, - 0) 


where /(x) = 'P(jc). For the Cauchy distribution, we have that these 
functions are 


p(x) = In 7t + In (1 + x 2 ). 


and 




2 x 


1 +JC 2 * 

In addition, we define a weight function as 




JC 


which equals 2/(1 + jc 2 ) in the Cauchy case. 

To appreciate how outlying observations are handled in estimating 
a center 0 of different models progressing from a fairly light-tailed 
distribution like the normal to a very heavy-tailed distribution like the 
Cauchy, it is an easy exercise (Exercise 8.21) to show that standard 
normal distribution, with p.d.f. (p(x), has 

^ In 2 ti + y, 'P(ac) = x, w(x) = 1. 

That is, in estimating the center 0 in (p(x — 6) each value of x has the 
weight 1 to yield the estimator § = X. 
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Also, the double exponential distribution, with p.d.f. 
f(x) = ^ e~ M , —co < 3c < oo, 
has, provided that x # 0, 

p(x) = In 2 + \x\, 'P(x) = sign (x), w(x)= ⑺ =& . 

Here 0 = median (Xj) because in solving 

i -0)=t sign (X, - 0) = 0 

i= I / = 1 

we need as many positive values of x,- — 0 as negative values. The 
weights in the double exponential case are of the order 1 /\x — 0|, while 
those in the Cauchy case are 2 / [1 + (x — 0) 2 ]. That is, in estimating the 
center, outliers are downweighted more severely in a Cauchy situation, 
as the tails of the distribution are heavier than those of the double 
exponential distribution. On the other hand, extreme values from the 
double exponential distribution are dowrtWeighted more than those 
under normal assumptions in arriving at an estimate of th6 center 0. 

Thus we suspect that the m.l.e. associated with one of these three 
distributions would not necessarily be a good estimator in another 
situation. This is true; for example. A" is a very poor estimator of the 
median of a Cauchy distribution, as the variance of X does not even 
exist if the sample arises from a Cauchy distribution. Intuitively, X is 
not a good estimator with the Cauchy distribution, because the very 
small or very large vajues (outliers) that can arise from that distribution 
influence the mean X of the sample too much. 

An estimator that is fairly good (small variance, say) for a wide 
variety of distributions (not necessarily the best for any one of them) 
is called a robust estimator. Also estimators associated with the 
solution of the equation 

- X m( Xi -6) = 0 

i= 1 

are frequently called robust M-estimators (denoted by because they 
can be thought of as maximum likelihood estimators. So in finding a 
robust Af-estimator we must select a 乎 function which will provide an 
estimator that is good for each distribution in the collection under 
consideration. For certain theoretical reasons that we cannot explain 
at this level, Huber suggested a 'F function that is a combination of 
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those associated with the normal and double exponential distributions, 
'P(jc) ― —k, x < —k 
=x, 一 k < x <k, 

=k, k < x, 

with weight w»(x) = 1, \x\ < k, and k/\x\ t provided that k < |> ： 1. In 
Exercise 8.23 the reader is asked to find the p.d.f. f{x) so that the 
A/-estimator associated with this 'P function is the m.l.e. of the 
location parameter d in the p.d.f. f{x — &). 

With Huber’s f function, another problem arises. Note thatjf we 
double (for illustration) each X U X U ..., X„, estimators such as X and 
median {X,) also double. This is not at all true with the solution of the 
equation 

t m -^) = o, 

i = I 

where the 乎 function is th^U of Huber. One way to avoid this difficulty 
is to solve another, but similar, equation instead, 

I 样 ) =。， 0) 

? 

where is a robust estimate of the scale. A popular d to use is 

j median — median (jc f )| 

0.6745 ■ 

The divisor 0.6745 is inserted in the definition of d b^ause then d is 
a consistent estimate of a and thus is about equal to <r, if the sample 
arises from a normal distribution. That is, a can be approximated by 
d under normal assumptions. 

That scheme of selecting d also provides us with a clue for select¬ 
ing k. For if the sample actually arises from a normal distribution, we 
would want most of the values xi,x 2 ,.. ., x, to satisfy the inequality 


because then 


Xi — 0\ Xj — 6 
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That is, for illustration, if all the values satisfy this inequality, then 
Equation (1) becomes 






1 


Xi 


e 


d 


o. 


This has the solution 3c, which of course is most desirable with normal 
distributions. Since d approximates a, popular values of k to use are 
1.5 and 2.0, because with those selections most normal variables would 
satisfy the desired inequality. 

Again an iterative process must usually be used to solve Equation 
(1). One such scheme, Newton’s method, is described. Let & 0 be a first 
estimate of 0, such as 沒 。 =median (x,). Approximate the left-hand 
member of Equation (1) by the first two terms of Taylor’s expansion 


about to obtain 








d 


+ 作一為 ） I 




d 


")( ■ 


d 1 


0, 


approximately. The solution of this provides a second estimate of 0, 


+ 



which is called the one-step Af-estimate of 9. If we use in place of 
we obtain 沒 2 , the two-step Af-estimate of 9. This process can 
continue to obtain any desired degree of accuracy. With Huber's 'F 
function, the denominator of the second term, 




r X i 




d 


is particularly easy to compute because =1 ， —k <： x < k, and 
zero elsewhere: Thus, that denominator simply counts the number of 
X,, x 2 ,..., such that \x/ — ^ 0 \fd <： k. 

Say that the scale parameter <r is known (here a is not necessarily 
the standard deviation for it does not exist for a distribution like the 
Cauchy). Two terms of Taylor’s expansion of 





s 0 
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about 0 provides the approximation 




This can be rewritten 


G-e 


2^(^) 


For the asymmetric functions that we have considered 


中(¥)] = 0 , 


provided that X has a symmetric distribution about 9. Clearly, 
var - 

' , — 1. 1 BMW* ' » — 

Thus Equation (2) can be rewritten as 、 

_ 0 ) _ 




0 、 


. 0 ) 


Clearly, by the central limit theorem, the numerator of the right- 
hand member of Equation (3) has a limiting standardized normal 
distribution, while the denominator converges in probability to 1. Thus 
the left-hand member has a limiting distribution that is A^(0, 1). In 
application we must approximate the denominator of the left-hand 
member. So we say that the robust Af-estimator 6 has an approximate 
normal distribution with mean 0 and variance 
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where ^ is the (last) k-step estimator of 8. Of course , 沒 is approximated 

by 6 k \ and an approximate 95 percent confidence interval for 6 is 

given by ^ — 1.96^/r to A + 1.96^/y. 

EXERCISES 

8.21. Verify that the functions p(jc), ^(x), and >v(jc) given in the text for the 
normal and double exponential distributions are correct. 

8.22. Compute the one-step Af-estimate using Huber's 'P with k = 1.5 if 
n = 1 and th^ seven observations are 2.1,5.2,2.3,1.4,2.2,2.3, and 1.6. Here 
take = 2.2, the median of th« sample. Compare with 3c. 

8.23. Let the p.d.f. /(x) be such.that the Af-estimator associated with Huber's 
'P function is a maximum likelihood estimator of the location parameter 
in f(x — 0). Show that /(jc) is of the form ce~ Pl(x) , where p { (x) = x 1 /!, |jc| ^ k 
and p,(x) = fc|x| — k 7 l2, k < |x|. 

8.24. Plot the 'P functions associate with the normal, double exponential, 
and Cauchy distributions in addition to that of Huber. Why is the 
Af-estimator associated with the 'P function of the Cauchy distribution 
called a redescending Af-estimator? 

8.25. Use the data in Exercise 8.22 to find the one-step redescending M- 
estimator 6 X associated with 'P(jc) = sin (x/1.5), |x| <, 1.571, zero elsewhere. 
This was first proposed by D. F. Andrews. Compare this to x and the 
one-step Af-estimator of Exercise 8.22. [It should be noted that there is no 
p.d.f. f\x) that could be associated with this 'P(jc) because 'P(jf) = 0 if 
\x\ > 1.571.] 


ADDITIONAL EXERCISES 

8.26. Let X it X 2 ,..., X„bea random sample from a gamma distribution with 
a = 2 and p = 1/9, 0 < 9 < oo. 

(a) Find the m.l.e., of 6. Is ^ unbiased? 

(b) What is the approximating distribution of 

(c) If the prior distribution of the parameter is exponential with mean 2, 
determine the Bayes 1 estimator associated with a square-error loss 
function. 


8.27. If X t , X 2 ,... , is a random sample from a distribution with p.d.f. 
f(x; 6) = 30 3 (x + 9)~ A , 0 < x < cc, zero elsewhere, where 0 < 0, show that 


Y = 2X is an unbiased estimator of 6 and determine its efficiency. 

8.28. Let Jf,, ..., be a random sample from a distribution with p.d.f. 

Q 

f(x; 6) = - - r ,0<x< oo, zero elsewhere, where 0 < 0. 

(1 + xy + i 



(a) Find the m.l.e., S, of 6 and argue that it is a complete sufficient 
statistic for 6. Is 0 unbiased? 

(b) If ^ is adjusted so that it is an unbiased estimator of 9, what is a lower 
bound for the variance of this unbiased estimator? 

8.29. If ^i, , X„ is a random sample from N(9 t 1), find a lower bound 

for the variance of an estimator of k(6) = ff 2 . Determine an unbiased 
minimum variance estimator of 0 1 and then ccmipute its efficiency. 

8.30. Suppose that we want to estimate the middle, 9, of a symmetric 
distribution using a robust estimator because we believe that the tails of this 
distribution are much thicker than those of a normal distribution. A 
/-distribution with 3 degrees of freedom with center at 6 (not at zero) is 
such a distribution, so we decide to use the m.l.e., associated with that 

; distribution as our robust estimator. Evaluate 沒 for the five observations: 
10.1,20.7,11.3,12.5,6.0. Here we assume that the spread parameter is equal 
to 1. 

8.31. Consider the normal distribution jV(0, 6). With a random saiflple 

Xi, X 2y .. . want to estimate the standard deviation \/d. Find the 

constant c so that 7 = c ^ 岡 is an unbiased estimator of y/d and 
determine its efficiency. 





CHAPTER 


Theory of 
Statistical Tests 


9.1 Certain Best Tests 

In Chapter 6 we introduced many concepts associated with tests of 
statistical hypotheses. In this chapter we consider some methods of 
constructing good statistical tests, beginning with testing a simple 
hypothesis H 0 against a simple alternative hypothesis H x . Thus, in all 
instances, the parameter space is a set that consists of exactly two 
points. Under this restriction, we shall do three things: 

1. Define a best test for testing H 0 against H\. 

2. Prove a theorem that provides a method of determining a best test, 

3. Give two examples. 

Before we define a best test, one important observation should 
be made. Certainly, a test specifies a critical region; but it can also be 
said that a choice of a critical region defines a test. For instance, if 
one is given the critical region C = {(jci , x 2l x 3 ): x 1 2 3 ] + ^ l], the 

test is determined: Three random variables X x , X 2 , X 3 are to be 
considered; if the observed values are jc , 5 x 2 , x 3 , accept H 0 if 
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+ X3 < 1; otherwise, reject H 0 . That is, the terms “test” and 
“critical region” can, in this sense, be used interchangeably. Thus, if 
we define a best critical region, we have defined a best test. 

Let /(x; 9) denote the p.d.f. of a random variable X. Let 
X u X 2 ,..., X n denote a random sample from this distribution, and 
consider the two simple hypotheses H q :Q — Q' and H x :0 — 0". Thus 
Q = {0 ： 0 = B\ 0"). We-now define a best critical region (and hence a 
best test) for testing the simple hypothesis against the alternative 
simple hypothesis H t . In this definition the symbols 
Pr [(X t ,X 2 ,...,X n )e C; H 0 ] and Pr [(X { C; //,] mean 

Pr [(A",, X 2 ,..., X„) g C] when, respectively, H 0 and H t are true. 

Definition 1. Let C denote a subset of the sample space. Then C is 
called a best critical region of size a for testing the simple hypothesis 
H 0 :6 = d f against ^ie alternative simple hypothesis H 、： & = 0 〃 
if, for every subset A of the sample space for which 
Pr [{Xi,... ,X„)eA;H 0 ] = a: 

⑻ Pi[(X i ,X 2 ,,..,X„)eC;H 0 ]^<x. 

(b) Pr [(H … ，总 ） e C; //|] > Pr [(X t ,X 2 ,... ,X„)s 

This definition states, in effect, the following: First assume H 0 to 
be true. In general, there will be a multiplicity of subsets A of the 
sample space such that Pr [(A",, X 2 ,..., X„) e A] = a. Suppose that 
there is one of these subsets, say C, such that when is true, the power 
of the test associated with C is at least as great as the power of the test 
associated with each other A Then C is defined as a best critical region 
of size a for testing H 0 against H t . 

In the following example we shall examine this definition in some 
detail and in a very simple case. 

Example 1. Consider the one random variable X that has a binomial 
distribution with w — 5 and p = 0. Let f(x; 8) denote the p.d.f. of X and let 
Hq ： 9 and H x : 8 = \. The following tabulation gives, at points of positive 
probability density, the values of f(x; ^), f(x\ |), and the ratio f(x; |). 

jc 0 1 2 3 4 5 

1 

32 
1 

T0?4 

I) 32 
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We shall use one random value of X to test the simple hypothesis H 0 :6 = ^ 
against the alternative simple hypothesis H^.d = \, and we shall first 
assign the significance level of the test to be a = ^. We seek a best 
critical region of size a = 忐 . If i4, = {jc : jc = 0} and A 2 = {x: x = 5}, then 
Pr (X e A t ; Hq) = Pr (A" g A 2 ； H a ) = ^ and there is no other subset A 3 of the 
space {jc : jc = 0, 1, 2, 3, 4, 5} such that Pr (A"g Ay, H 0 ) = Then either { or 
is the best critical region C of size a = 士 for testing H 0 against H t . We note 
that Pr(Xe A H 0 ) = ^ and that Pr (X e A { ; H x ) = Thus, if the setv4, is 
used as a critical region of size a = ^, we have the intolerable situation that 
the probability of rejecting H 0 when //, is true (H 0 is false) is much less than 
the probability of rejecting H Q when H 0 is true. 

On the other hand, if the set A 2 is used as a critical region, then 
Pr (X e A 2 ; H 0 ) = ^ and Pr (X € A 2 ; H } ) = That is, the probability of 
rejecting H 0 when H x is true is much greater than the probability of rejecting 
H 0 when H 0 is true. Certainly, this is a more desirable state of affairs, and ac¬ 
tually A 2 is the best critical region of size a = 士 . The latter statement follows 
from the fact that, when H 0 is true, there are but two subsets, and A 2 , of 
the sample space, each of whose probability measure is ^ and the fact that 

黑 =Pr (XsA 2 ;H l ) >?T(XeA l ;H l ) = 1 ^. 

It should be noted, in this problem, that the best critical region C = A 2 of size 
a = 士 is found by including in C the point (or points) at which f(x; 5 ) is small 
in comparison with f(x; |). This is seen to be true once it is observed that the 
ratio 凡 c; 去 ) //(jc; !) is a minimum at x = 5. Accordingly, the ratio 凡 c; j)//(jc; 5 ), 
which is given in the last line of the above tabulation, provides us with a precise 
tool by which to find a best critical region C for certain given values of a. To 
illustrate this, take a = 备 . When H 0 is true, each of the subsets {jc : jc = 0,1}, 
{jc : jc = 0,4}, {jc : jc = 1, 5}, {jc : x = 4, 5} has probability measure By 
direct computation it is found that the best critical region of this size is 
{jc : jc = 4, 5}. This reflects the fact that the ratio has its 

two smallest values for jc = 4 and x = 5. The power of this test, which has 
a =? 蟲 ， is . 

朽 (1 = 4, 5; //■)= 黑 + 黑 = 截 f. 

The preceding example should make the following theorem, due to 
Neyman and Pearson, easier to understand. It is an important theorem 
because it provides a systematic method of determining a best critical 
region. 

Neyman-Pearson Theorem. Let X { , X 2l ..., X„, where n is a fixed 
positive integer, denote a random sample from a distribution that has 
p.d.f. /(x; 9). Then the joint p.d.f. of X x , X 2i ..., X„ is 

L{&\ x it x 2 , =f(x t ; d)J{x 2 ; &)‘•. Ax„\ 0). 
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Let O' and d" be distinct fixed valuesofd so that Sl = {0: 0 = d\ ^}, and 
let k be a positive number. Let C be a subset of the sample space such 
that: 


L{Q'\ x u x 2 . 


⑻ 

( b ) L(d"; xi,x 2 ,... 

(c) a = Pr [(^i, X 2 , 


X 2 ,... 
1 (0'; x u x 2 ,... 


X n ) 


x„) 


X„) 


< k, for each point (jc, , jc 2 , ..., x„) e C. 

> k, for each point (x,, x 2j . .. e C*. 
X n ) g C; H 0 \. 


Then C is a best critical region of size a for testing the simple hypothesis 
H 0 : 6 = 9" against the alternative simple hypothesis H' •• 0 = 6". 

Proof. We shall give the proof when the random variables are 
of the continuous type. If C is the only critical region of size 
a, the theorem is proved. If there is another critical region of 
size a, denote it by A. For convenience, we shall let 
j ^ l L(0 ; : c, ，…， jc ”） 私 … be denoted by J 及 L(0). In this 
notation we wish to show that 


im - 


•>c 


L(0") > 0. 




Since C is the union of the disjoint sets C n A and C n A* and A is 
the union of the disjoint sets A n C and A n C*, we have 


Lm - 


c 


L(r) 


ue") + 




L(01 - 


n A* 


L(6") - 


f\C 


UQ") 


Ur»C* 


L(r) 


C r\ A* 


Lm. 


(i) 




However, by the hypothesis of the theorem, L{9") ^ {\jk)L{Q') at each 
point of C, and hence at each point of C n A*; thus 


Lm > I 


L(ey 
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But L{d")<{\(k)L{6 , ) at each-point of C*, and hence at each point of 
AnC*; accordingly. 


L{ei<\ 




L(01. 


AnC* 


These inequalities imply that 


k 


U6")- UQ")^. 

^AnC* W 

and, from Equation (1), we obtain 




Lie% 


AnC* 


广 

r i 



/% 

Lie")- 



W)- 

L(6 f ) 

c J 

A 


CnA* J 

AnC* 


( 2 ) 


However, 


網一 


CnA' 


L(f) 




W) + 


Cr\A* 

攀)一 

‘C 

a —a = 0. 


谬)一 


CnA 

Lm 


L(01- 


AnC 


L(01 


If this result is substituted in inequality (2), we obtain the desired 
result, 




L(0")~ L(0")>O. 

A 


If the random variables are of the discrete type, the proof is the same, 
with integration replaced by summation. 

Remark. As stated in the theorem, conditions (a), (b), and <c) are sufficient 
ones for region C to be a best critical region of size a. However, they are also 
necessary. We discuss this briefly. Suppose there is a region A of size a that 
does not satisfy (a) and (b) and that is as powerful at 0W as C, which satisfies 
(a), (b), and (c). Then expression (1) would be zero, since the power at Q" using 
A is equal to that using C. It can be proved that to have expression (1) 
equal zero A must be of the same form as C. As a matter of fact, in 
the continuous case, A and C would essentially be the same region; that 
is, they could differ only by a set having probability zero. However, in 
the discrete case, if Pr [L{0') = kL{fi"y, H 0 ] is positive, and C could be 
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different sets, but each would necessarily enjoy conditions (a), (b), and (c) to 
be a best critical region of size a. 


One aspect of the theorem to be emphasized is that if we take C to 
be the set of all points (x^x 2 ,..., x„) which satisfy 


xi,x 2 , 


^ k, k>0. 


L(6 j X I , X2, •…， x") 

then, in accordance with the theorem, C will be a best critical region. 
This inequality can frequently be expressed in one of the forms (where 
C| and c 2 are constants) 


U|(X|, X 2 , • • • > 6,0 ) ^ 


U 2 (Xi,X 2 , ..., X„- e\ 6") > c 2 . 

Suppose that it is the first form, m, < c,. Since 0' and are given 
constants, X 2 ,, X„; 0\ 0 w )isa statistic; and if the p.d.f. of this 
statistic can be found when // 0 is true, then the significance level of the 
test of H 0 against Hi can be determined from this distribution. That 
is, 

a = Pr [i^d，, X„\ d\ 8 >, ') ^ ; H 0 ]. 

Moreover, the test m^y be based on this statistic; for, if the observed 
values of A",, , X„ are x,, x 2 ,..., x„, we reject H 0 (accept H { ) if 

A positive number k determines a best critical region C whose size 
isa = Pr [(Jf, X„) e C; H 0 ] for that particular^. It maybe that 

this value of a is unsuitable for the purpose at hand; that is, it is too 
large or too small. However, if there is a statistic X 2 ,..., X„), 
as in the preceding paragraph, whose p.d.f. can be determined when 
H 0 is true, we need not experiment with various values of k to 
obtain a desirable significance level. For if the distribution of the 
statistic is known, or can be found, we may determine c, such that 
Pr [m,(U 2 , ..., X„) < c,; H 0 ] is a desirable significance level. 

An illustrative example follows. 

Example 2. Let X it X 2 ,..., X„ denote a random sample from the 
distribution that has the p.d.f. 



— 00 < x < 00. 
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It is desired to test the simple hypothesis H 0 :0 = 6' = 0 against the alternative 
simple hypothesis H X :B = 9 1 ' = \. Now 


， ... ， x n ) 
x x ,. ..,x„) 


(1/v^)" exp HH/ 

(1 exp - ( 零 (x, - 1 ) 2 


exp - 




If fc > 0, the set of all points (x i9 又 2 , •. • ， x n ) such that 

exp 


is a best critical region. This inequality holds if and only if 

-X ^ In 


or, equivalently, 


X x, ^ ^ — In fc = c. 

i 2 


in this case, a best critical region is the set C = |(x,, x 2 . x„): x t > c|, 

where c is a constant that can be determined so that the size of the critical 


region is a desired number a. The event [ 足之 c is equivalent to the event 

_ 丨，一 
X > c/n = c ]f say, so the test may be based upon the statistic X. If is true, 

that is, 6 = ^ = 0, then X has a distribution that is AT(0, \/n). For a given 

positive integer n, the size of the sample, and a given significance level a, the 

number c, can be found from Table III in Appendix B, so that 

Pr {X ^ £■,; Ha) = a. Hence, if the experimental values of X x , X 2 , .. ■, X„wctc, 

respectively, X|, x 2 ,... , x„, we would compute x = x,/n. If 3c > C|, the 

i 

simple hypothesis H 0 : 6 = O' = 0 would be rejected at the significance level 
a; if 3c < c,, the hypothesis H 0 would be accepted. The probability of rejecting 
H 0 , when Hf, is true, is a; the probability of rejecting H 0 , when H 0 is false, 
is the value of the power of the test 6 = 0" = 1. That is. 


Pr (叉之 c,; H t )= 


•00 J 

「 ( x -\ y ~ 

Jf , sftn^/xjn 

_ 2 _ _ 


dx. 


For example, if n = 25 and if a, is selected to be 0.05, then from Table III 
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we find that q = 1.645/^/25 = 0.329. Thus the power of this best test of // 0 
against H { is 0.05, when H Q is true, and is 


0.329 


when Hi is true. 


exp 


「 (x - 1)H 

dx = 

L 2( 去） J 



- 3.355 




= 0.999 + , 


There is another aspect of this theorem that warrants special 
mention. It has to do with the number of parameters that appear in 
the p.d.f. Our notation suggests that there is but one parameter. 
However, a careful review of the proof will reveal that nowhere was 
this needed or assumed. The p.d.f. may depend upon any finite number 
of parameters. What is essential is that the hypothesis H 0 and the 
alternative hypothesis H\ be simple, namely that they completely 
specify the distributions. With this in mind, we see that the simple 
hypotheses H 0 and H } do not need to be hypotheses about the 
parameters of a distribution, nor, as a matter of fact, do the random 
variables X ]f X 2> ..., X„ need to be independent. That is, if H 0 is the 
simple hypothesis that the joint p.d.f. is jc 2 , … ， x”)，and if H t 
is the alternative simple hypothesis that the joint p.d.f. is 
h(x t , jc 2 , ..., x„), then Cis a best critical region of size a for testing H 0 
against if, for k > 0: 


\\ 


2\ 


g(x }j x 2 . 




h(x\,x 2 , 

g(x lf x 2 . 




,x„) 

,x„) 


< k for (x,, Xj ,..., x„) g C. 
> k for (x!,x 2 ,... ,x„)eC*. 


h(Xi,x 2 ,. .., x„) 

ol = Pt[(X x ,X 2 ,.. ： ,X„)eC;H 0 ]. 

An illustrative example follows. 

Example 3. Let ， … ， X n denote a random sample from a distribution 
which has a p.d.f. f{x) that is positive on and only on the nonnegative 
integers. It is desired to test the simple hypothesis 


Mo ： f(x) 


e~' 

lc\ 


x 


0 , 1 , 2 ,, 

= 0 elsewhere, 

against the alternative simple hypothesis 

Hx = x = 0,1,2,.. 


= 0 elsewhere. 
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Here 


咖 ， … ， x„) e_ n l(x '! x 2 ! • • • x„!) 


Kx f - = + + A 


( 2 e-')" 2 ^ 

fl W) 


If * > 0, the set of points (x,, jc 2 , ..., x") such that 


D In 2 _ In 


n w) 


^ In ^ «In (2e~ l ) 


is a best critical region C. Consider the case of A: = 1 and n= The preceding 
inequality maybe written 2 Jf '/x 1 ! < e/2. This inequality is satisfied by all points 
in the set C = {^,: jf, = 0, 3,4, 5,..Thus the power of the test when H a 
is true is 

Pr(JT, g C; /f 0 ) = 1 - Pr (JIT, = 1,2; H 0 ) = 0.448, ' 

approximately, in accordance with Table I of Appendix B. The power of the 
test when H y is true is given by 

Pr(JT, gC;//,)= 1 -Pr(X, = 1,2; H t ) 

=1 - (1 + ^) = 0.625. 

Remark. In the notation of this section, say C is a critical region such that 

m /» 

a = H&) and p= L{9"), 


so that here a and p equal the respective probabilities of the type I and type 
II errors associated with C. Let d t and d 2 be two given positive constants. 
Consider a certain linear function of a and p, namely 


d x 


y* 

L(^) + d 2 
J C 


L(0") 

Jc* 



* 「 

L(e f ) + d 2 l 

j c - 


Lifi") 

J c 


=d 2 + [dyLm - 禹 L(6T)I. 

Jc 


If we wished to minimize this expression, we would select C to be the set of 
all (x,, x 2 ,. .., x„) such that 

厚）一 d 2 L{e") < 0 

or, equivalently, 

L{9 r ) d 2 

TT^r <-, for all (x l9 x 2 ,... 9 x„)eC 9 

厶 (tr) ^ 
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which according to the Neyman-Pearson theorem provides a best critical 
region with k = d 2 ld、. That is, this critical region C is one that minimizes 
+ d 2 p. There could be others, for example, including points on which 
= d 2 /di, but these would still be best critical regions according to 
the Neyman-Pearson theorem. 

EXERCISES 

9.1. In Example 2 of this section, let the simple hypotheses read 
H 0 :d = 6 , = 0sLndH { ： 0 = 0" = — 1 • Show thaUhe best test of H 0 against 
H t may be carried out by use of the statistic X, and that if n = 25 and 
a = 0.05, the power of the test is 0.999 + when //, is true. 

9.2. Let the random variable X have the p.d.f. f(x; 6) == (\/6)e~ XIB , 
0 < x < oo, zero elsewhere. Consider the simple hypothesis H 0 : 9 = 6 r == 2 
and the alternative hypothesis H X \B = Q" = 4. Let X u X 2 denote a 
random sample of size 2 from this distribution. Show that the best test 
of H 0 against may be carried out by use of the statistic X { + X 2 and 
that the assertion in Example 2 of Section 6.4 is correct. 

9.3. Repeat Exercise 9.2 when H r : 8 = O 1 ' = 6. Generalize this for every 
6" > 2. 

9.4. Let X u X 2 ,..., ^, 0 be a random sample of size 10 from a normal 
distribution ^(0, a 2 ). Find a best critical region of size a = 0.05 for testing 
H 0 ： a 2 = \ against //, : a 2 = 2. Is this a best critical region of size 0.05 for 
testing H 0 : a 2 = \ against //, : tr 2 = 4? Against H x \ a 1 = a\> M 

9.5. If X t , X 2 ,..., X„ is a random sample from a distribution having 
p.d.f. of the form f\x\ 8) = 0x^~\ 0 < x < 1, zero elsewhere, show 
that a best critical region for testing H 0 : 6 = 1 against H x \d — 2 is 

C =|(x,,x 2 ,..., xj : c ^ fl X, 

9.6. Let X y , , A'iq be a random sample from a distribution that is 

N(0 t , 0 2 ). Find a best test of the simple hypothesis H 0 \ 0 t = B\ = 0, 
0 2 = 0 f 2 = I against the alternative simple hypothesis //, : 0, = 0" = 1, 
0 2 = 9'i=4. … 

9.7. Let X 2 ,..., X H denote a random sample from a normal distribution 

N(6, 100). Show that C = |(x,, x 2> ..., x„): c < x = » s a best criti¬ 

cal region for testing H 0 : 6 = 15 against //,: 0 = 78. Find n and c so that 

Pr [(X u X 2 ,...,X„)gC;H 0 ] = Pt(X>c-H q ) = 0.05 

and 

Pr [(A ",, X 2y ..., X n ) e C; H t ] = Pr(P ^ c; //,) = 0.90, approximately. 
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9.8. If X lf X 2 ,... ,X„ is a. random sample from a beta distribution with 
parameters a = 办 =0 > 0, find a best critical region for testing H 0 : 6 = 1 
against H X :B = 2. 

9.9. Let X h X 2 , denote a random sample from a distribution having 
the p.d.f. f(x; p) = /^(l — p) l ~ x , X = 0 , 1, zero elsewhere. Show that 

C = {(jc “ • • • ， x ”） ：年 jc,. S c|> is a best critical region for testing H 0 :p = \ 

against Use the central limit theorem to find n and c so that 

approximately Pr ^ <, c\ Z/。) = 0.10 and Pr ^ X, < c\ = 0.80. 

9.10. Let X,, X 2 ,..., X i0 denote a random sample of size 10 from a Poisson 

distribution with mean 6. Show that the critical region C defined by 艺 ^ 3 

I 

is a best critical region for testing H o :0 = 0.1 against H^\0 = 0.5. 
Determine, for this test, the significance level a and the power at 0 = 0.5. 


9.2 Uniformly Most Powerful Tests 

This section will take up the problem of a test of a simple hypoth¬ 
esis N 0 against an alternative composite hypothesis H'. We begin 
with an example. 

Example 1. Consider the p.d.f. 

r、V /(jc; 汐） = 士 e_ xl8 , 0 < JC < 00, 

u 

= 0 elsewhere, 

of Example 2, Section 6.4, and later of Exercise 9.3. It is desired to test the 
simple hypothesis H o \0 = 2 against the alternative osmposite hypothesis 
H t :0> 2. Thus Q = {0: 0 ^ 2}. A random sample, X t ,X 2 , of size n = 2 will 
be used, and the critical region is C = {(x,, jc 2 ) : 9.5 ^ jc, -h x 2 < oo}. It was 
shown in the example cited that the significance level of the test is 
approximately 0.05 and that the power of the test when 沒 = 4 is approximately 
0.31. The power function K(0) of the test for all 0 ^ 2 will now be obtained. 
We have 


*9.5 / «9,5 — JT2 





A + 文2 、 
___— 


dx' dx 2 




2^0. 


For example, K(2) = 0.05, K(4) ==0.31, and A(9.5) = 2/e. It is known 
(Exercise 9.3) that C = {(jc,, x 2 ) : 9.5 < X] + x 2 < oo} is a best critical region 
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of size 0.05 for testing the simple hypothesis H o :0 = 2 against each simple 
hypothesis in the composite hypothesis H t :6>2. 

The pr^eding example affords an illustration of a test of a simple 
hypothesis H 0 that is a best test of M 0 against every simple hypothesis 
in the alternative composite hypothesis //,. We now define a critical 
region, when it exists, which is a best critical region for testing a simple 
hypothesis H Q against an alternative composite hypothesis . It seems 
desirable that this critical region should be a best critical region for 
testing H 0 against each simple hypothesis in //,. That is, the power 
function of the test that corresponds to this critical region should be 
at least as great as the power function of any other test with the same 
significance level for every simple hypothesis in H t . 

Definition 2. The critical region C is a uniformly most powerful 
critical region of size a. for testing Uie simply hypothesis H 0 against an 
alternative composite hypothesis //, if the set C is a best critical region 
of size cl for testing H 0 against each simple hypothesis in H A . A test 
defined by this critical region C is called a uniformly most powerful test ， 
with significance level a, for testing the simple hypothesis i/ 0 against 
the alternative composite hypothesis H,. 

As will be seen presently, uniformly most powerful tests do not 
always exist. However, when they do exist, the Neyman-Pearson 
theorem provides a technique for finding them. Some illustrative 
examples are given here. 


Example Z Let X 2 ,..., X„ denote a random sample from a 
distribution that is N(Q, &), where the variance 6 is an unknown positive 
number. It will be shown that there exists a uniformly most powerful test with 
significance level a for testing the simple hypothesis H 0 :9 = d\ where d r is a 
fixed positive number, against the alternative composite hypothesis 
H y \d> 9\ Thus ft = {0 : & > d r }. The joint p.d.f. of X { ,X 2 , ...., X„ is 



Let f represent a number greater than 9\ and let k denote a positive number. 
Let C be the set of points where 

L(9 , \x i ,x 2 ,.. .,x n ) 

L(0"; x u x 2 ,.. .,x„) ’ 
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that is, the set of points where 



exp — 




<,k 


or, equivalently, 




20 , 0 " 

9" -9 f 


I 1 * (f) 


- Ink 


C. 


The set C = |(x,, x 2 ,..., x n ): jc^ > c| is then a best critical region for 

testing the simple hypothesis H 0 : 6 = d f against the simple hypothesis 6 = 6 ft . 
It remains to determine c, so that this critical region has the desired size a. 

n 

If H 0 is true, the random variable has a chi-square distribution with 

degrees of freedom. Since a = Pr ( 零存 'k ，;//。)， cjO' may be 

read from Table II in Appendix B and c determined. Then C = 

|(JC|, ... ， ;c”): f x? 3: is a best critical region of size a for testing 

H 0 :d against the hypothesis 0 = B". Moreover, for each number 0" 
greater than O', the foregoing argument holds. That is, if W is another 

number greater than 0'，then C = |(x,,. ；. , x„) : ^ c| is a best critical 

region Of size a for testing H^ : B = 9' against the hypothesis 6 = 6'". 

• ■ ■' • 

f » 1 , 

Accordingly, C = j (X|,..., > c >• is a uniformly most powerful 

critical region of size a for testing H 0 '-9= 0 f against H x \d> 6\ If 
jc,, x 2 , • • •, denote the experimental values of X U X 2 , ..., X„, then 
H 0 :6 = 6' is rejected at the significance level a, and : B > is accepted, 

n 

if X!^ c \ otherwise, H 0 •• 0 = 0" is accepted. 

i ， 

If, in the preceding discussion, we take n = 15, a = 0.05, and B' = 3, then 
here the two hypotheses will be H 0 :6 =p3~and H v : d > 3. From Table II, 
c/3 = 25 and hence c = 75. 


Example 3. Let X\, X 2 ,..., X„ denote a random sample from a 
distribution that is N(9, 1), where the mean d is unknown. It will be 
shown that there is no uniformly most powerful test of the simple 
hypothesis H 0 :6 = d\ where 6 f is a fixed number, against the alternative 
composite hypothesis H t :8 ^ 0\ Thus fl = {0 : — oo < 0 < oo}. Let 9" be a 
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number not equal to 6\ Let A ： be a positive number and consider 

I 

- ^ k. 

(l/27c)-/ 2 exp -th-eyil 
_ 1 _ 

The preceding inequality may be 、 穿 ritten as 

exp - 00 tx i + \ [(9") 2 - A: 

or n 

{&' - {O'f\-\nk. 

I L 

This last inequality is equivalent to 

provided that ff')> O', and it is equivalent to ■ , / 


if 6" < 9\ The first of these two expressions defines a best critical region for 
testing H 0 :6 = 6' against the hypothesis 9 = 9 n provided that 9" > 6\ while 
the second expression defines a best critical, region for testing H 0 : $ ^ O' 
against the hypothesis 6 = 0" provided that d" < O'. That is, a best critical 
region for testing the simple hypothesis against an alternative simple 
hypothesis, say 6 = 6' + l, will not serve as a best critical region for testing 
H 0 : 9 = O' against the alternative simple hypothesis 9 = Q' — \, say. By 
definition, then, there is no uniformly most powerful test in the case under 
consideration. 

It should be noted that had the alternative composite hypothesis been 
either H t : 6 > or H,:9 < 9', a uniformly most powerful test would exist 
in each instance. 

Example 4. In Exercise 9.10 the reader was asked to show that if a 
random sample of size n = 10 is taken from a Poisson distribution with 

10 

mean 8, the critical region defined by I x; > 3 is a best critical region for 

I 

testing H 0 : 0 = 0.1 against H t : 9 = 0.5. This critical region is also a uniformly 
most powerful one for testing H 0 : 9 = 0.1 against H { : 9 > 0.1 because, with 
^ > 0 . 1 , 

(0.1产分 _ 離"/(叉|! jc 2 ! … • x H !) 

(d") xx <e~ l(Kr) /Wx 2 \- -x n \) ^ 
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Let us make an observation, although obvious when pointed out, 
that is important. Let X Xr X 2 , •.. ， denote a random sample from 
a distribution that has p.d.f. f(x; 6), OeQ. Suppose that 
Y = u(X if X 2 ,... y X„) is a sufficient statistic for 6. In accordance with 
the factorization theorem, the joint p.d.f. of X u X 2y ...,X„ may be 
written 


U0-, x iy x 2 ,.. .,x„) = k y [u(x { , x„); 9\k 2 (xy,x 2 , •.. ， x„), 

where /c 2 (JC|, jc 2 , . •. ， A) does not depend upon 6. Consequently, the 
ratio 

X iy x 2 ,. ..yX„) ki [“(Xi ， x 2 , • ■ • ， x”); 01 

L(0 "； x it x 2 ,. .. ,x n ) , jc 2 , ..., 0 ,r \ 

depends upon jc,, jc 2 , , x„ only through u(x ,, x 2t ..., x„). Accord- 
in^y, if there is a sufficient statistic Y = u(X y , X l7 ..., X„) for 0 and 
if a best test or a uniformly most powerful test is desired, there is no 
need to consider tests which are based upon any statistic other than the 
sufficient statistic. This result supports the importance of sufficiency. 
Often, when F < 0' the ratio 

^ • • •• t ^n) 

t W",x u x 2 , r 

which depends upon jc, , jc 2 , . ., x„ only through^ = jc 2 , … ， jc„), 
is an increasing function of 少 = u(x t ,x 2 ,. •. ， x„). In such a case 
we say that we have a monotone likelihood ratio in the statistic 
Y=u{X u X ly ...,X„). 
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Example 5. Let Jf|, A" 2 ,..., be a random sample from a Bernoulli 
distribution with parameter p = 0, where 0 < 0 < 1. Let 8" < 0\ Then the 
ratio 


x if x 2 , 

L(0",x y ,x 2 . 


(0' 产 (i 一 ey- Xx i 


'0 ( i - e")~ 

Xxt n -e r \ 

_e"{\ - e r )_ 

\\ - ey 


Since d ， !e" >1 and (1 - 6"W -9 , )> 1, so that 0'(1 - 0"W\l - 6 f ) > 1, 
the ratio is an increasing function of >> = L x,. Thus we have a monotone 
likelihood ratio in the statistic Y =1. X h 

We can generalize Example 5 by noting the following. Suppose that 
the random sample X u X 2r ..., arises from a p.d.f. representing a 
regular case of the exponential class, namely 


f(x\ 6) = exp [p(6)K(x) + S(x) + q(0)\, 


= 0 elsewhere, 

- . - - - ..4 

where the space si of A" is free of 6. Furthef assume that p(6) is an 
increasing functioii of 6. Then 


Lm 

L(&') 


exp 


P(ei I K{X) + S 5(JC,.) + ^) 


exp 


piB") I K( Xi ) + X S(x t ) + nq{&') 


i^\ 


exp ^ [娜'）一 P ( 01 ] I K ( Xi ) + nW ) - 成 0")] 卜 


If 6" < 6\ p{&) being an increasing function requires this ratio to be 
an increasing function of. 少 =E 尺 ⑹. Thus we have a monotone 

likelihood ratio in the statistic F = [ K(X,). Moreover, if we test 
Hq ：Q ^ O' against H } :0 <9", then, with 0" < 0'; we see that 


蘭 

W) 


< k 


is equivalent to Z K(x,) <, c for every 0" < 6\ That is, this provides a 
uniformly most powerful critical region. 

If, in the preceding situation with monotone likelihood ratio 4 we 
test H 0 :6 = O' against //, : 0 > 8\ then Z AT(jc,) ^ c would be a 
uniformly most powerful critical region. From the likelihood ratios 
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displayed in Examples 2, 3, 4, and 5 we see imfnediately that the 
respective critical regions 

. 省 n n ' } m 

X Jc? > c, Z A ^ c > Z ^ ^ c, Y, Xi>C 

/ ; ^ I / I -1=1 1 l ^l 

are uniformly most powerful for testing H 0 & W against H t :6 > 0\ 
There is a final remark that should be made about uniformly most 
pdwerful tests. Of course, in Definition 2, the word uniformly is 
associated with 0; that is, Cis a best critical region of size a for testing 
Ha ： 0 = 0 。 ..again,st all 0 values given by the composite alternative H t . 
However, suppose that the form of such a region is 

u(x,, X 2 , …， X ”） S c. 

Then this form provides uniformly most powerful critical regions for 
all attainable a values by , of course, appropriately changing the value 
of c. That is, there is a certain uniformity property, also associated 
with a, that is not always noted in statistics texts. 

EXERCISES 

v T- j 在 - 

9.11. Let X have the p.d.f. f{x\ 0) = ^(I - 6y~ K , x = 0, 1, zero elsewhere. 

We test the simple hypothesis H 0 :d = \ against the alternative composite 
hypothesis H ] : by taking a random sample of size 10 and rejecting 

H 0 :6 = I if and only if the observed values jc,, jc 2 , . . =, x l0 of the sample 

10 

observations are such that ^ jc ； < 1. Find the power function 尺 (0), 
0 < 0 ^ of this test. 1 

9.12. Let A" have a p.d.f. of the form /(jc; 0) = 1/0,0 < x < 0, zero elsewhere. 
Let Y t < Y 2 < Y 3 < V 4 denote the order statistics of a random sample of 
siK 4 from ihis distribution. Let the observed value of V 4 be y 4 . We reject 
H 0 : 6 = 1 and accept H { :0 ^ l if either >> 4 < 5 or > 1. Find the power 
function K(6), 0 < 0, of the test. 

9.13. Consider a normal distribution of the form N(0, 4). The simple 
hypothesis H 0 :8 = 0 is rejected, and the alternative composite hypothesis 
//, ; 0 > 0 is accepted if and only if the observed mean 3c of a random sample 
of size 25 is greater than or equal to Find the power function K(6), 0 <0, 
of this test. 

9.14. Consider the two normal distributions N(ji t , 400) and 225). Let 
0 = /i, — fi 2 - Let x and y denote the observed means of two.independent 
random samples, each of size 71 , from these two distributions. We reject 
H 0 :0 = 0 and accept //, : 0 > 0 if and only if x — y ^ c. If K{0) is the 
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power function of this test, find n and c so that 尺 (0) = 0.05 and AT(10)= 
0.90, approximately. 

9.15. If, in Example 2 of this section, H 0 :6 = where & is a fixed positive 

number, and H x \d < 0\ show that the set , x 2 ,..., x„) : ^ c| is 

a uniformly most powerful critical region for testing H 0 against //,. 

9.16. If, in Example—2 of thiisection, G = d\ where ^ is a fixed positive 

number, and H { :6 ^ d\ show that there is no uniformly most powerful test 
for testing H 0 against H y . 

9.17. Let H •• • ，尤 25 denote a random sample of size 25 from a normal 
distribution N(6, 100). Find a uniformly most piowerful critical region of 
size a = 0.10 for testing /f 0 : 0 = 75 against H } :9 > 75. 

9.18. Let X 2 ,... ,X B denote a random sample from a normal distribution 
N(0, 16). Find the sample size n and a uniformly most powerful test of 
H 0 :6 = 25 against W,: 0 < ^5 with power function K(6) so that 
approximately ^(25) = 0.10 and 尺 (23) = 0.90. 

9.19. Consider a distribution having a p.d.f. of the form /(oc; 0 )= 
0*(1 — 6)'~ x , x — 0,1, zero elsewhere. Let H 0 :6 = ^ and //,: 0 > ^. yse 
the central limit theorem to determine the sample size n of a random sample 
so that a uniformly most powerful test of H 0 against H } has a power function 
K(9), with approximately AT(^) = 0.05 and K(^) = 0.90. 

9.20. Illustrative Example 1 of this section dealt with a random sample of size 

n = 2 from a gamma distribution with a ■- 6. Thus the m.g.f. of the 

distribution is (1 — dt)~\ t < 1/0, &^2. Let Z = + X 2 . Show th 往 t Z 

has a gamma distribution with a = 2, P = 6i Express the power function 
K(9) of Example 1 in terms of a single integral. Generalize this for a random 
sample of size n. 

9.21. Let X 2 ,..., X„bea. random sample from a distribution with p.d.f. 
f(x; 9) = 0^~ 1 ,0 < x < oo, zero elsewhere, where 6>0. Find a sufficient 
statistic for 9 and show that a uniformly most powerful test of 好 0 : 0 = 6 
against : 0 < 6 is based on this statistic. 

9.22. Let X have the p.d.f. f(x; 6) ― &*(1 — 0 ) 1 ~ x = 0,1, zero elsewhere. 
We test H 0 :6 — i against H X .Q<\ by taking a random sample 

5 t 

X u X lt ..., Jfj of size n = 5 and rejecting 好 。 if K is observed to be 
less than or equal to a constant c. 1 

(a) Show that this is a uniformly most powerful test. 

(b) Find the significance level when c * 1. 

(c) Find the significance level when c = 0. 

(d) By usiAg a randomized test, modify the tests given in parts (b) and (c) 
to find a test with significance level a 
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9.3 Likelihood Ratio Tests 

' - > * - i ... 

The QQtion of using the magnitude of the ratio of two probability 
density functions as the basis of a best test or of a uniformly most 
powerful test can be modified, and made intuitively appealing, to 
provide a method of constructing a test of a composite hypothesis 
against an alternative composite hypothesis or of constructing a test 
of a simple hypothesis against afi 1 alternative composite hypothesis 
when a unifonnly most powerful test does not exist. This method leads 
to tests called likelihood ratio tests. A likelihood ratio test, as just 
remarked, is not necessarily a uniformly most powerful test, but it has 
been proved in the literature that such a test often has desirable 
properties. 

A certain terminolo 歡 and notation will be introduced by means 
of an example. 一 


Example 1. Let the random variable X be N(9 ] , d 2 ) and let the parameter 
space be Q = {(0|, 0 2 ) : — oo < 0, < oo, 0 < < oo}. Let the composite 

hypothesis beH 0 : 6^ = 0, 0 2 > 0, and let the alternative composite hypothesis 
be Hi-. 0 ^ 0,02 > 0. Tlie set to = {(0,, 9 2 ) : 0, = 0,0 < 0 2 < oo} is a subset 
of Q and will be called the subspace specified by the hypothesis H 0 . Then, for 
instance, the hypothesis H 0 maybe described as/f 0 : (^i> 9 2 )e(o.ltis proposed 
that we test H 0 against all alternatives in H x . 

Let X t , X 2f . .., denote a random sample of size n > 1 from the 
distribution of this example. The joint p.d.f. of X t , X 2 ,..., X n is, at each 


point in Q, 

02s -^Ij • ■ ■ > 




X ( X 厂权 〗 ) 2 


20 , 


L(S1). 


At each point (9j, 0 2 ) € a, the joint p.d.f. of X 2 , •.., X„ is 


L(0, 0 2 ; x u 






exp 




20 


2 


U(0). 


The joint p.d.f., now denoted by L(co), is not completely specified, since d 2 may 
be any positive number; nor is the joint p.d.f., now denoted by L(A), 
completely specified, since h may be any real number and d 2 any positive 
number. Thus the ratio of L((o) to L(Si) could not provide a basis for a 
test of H 0 against /f,. Suppose, however, that we modify this ratio in the 
following manner. We shall find the maximum of L{(o) in (o, that is, the 
maximum of L(co) with respect to 0 2 . And we shall find the maximum of 
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L(Q) in Q, that is, the maximum of L(Cl) with respect to 0 ] and 0 2 . The ratio 
of these maxima will be taken as the criterion for a test of H 0 against H t . 
Let the maximutn of L(pj) in to be denoted by L((b) and let the maximum of 
L(£l) in A be denoted by Then the criterion for the test of against 

H. is the likelihood ratio 

f , , l(d>) 

‘ X(x u x 2 ,x H )~ X . 

’ L(i2) 

Since £(co) and L(Q) are probability density functions, k ^ 0; and since co is 
a subset of O,^ 1., ' 

In our example the^maximum, of L{oS) is obtained by first setting 






dXnUfol 

dd 2 


n 



ie 2 2&i 


n 



equal to zero and solving for d 2 . The solution of 办 is 二 Jcffn, and this number 
maximizes L{<o). Thus the maximum is 1 


«/2 


L(d >)= 


2nf^xf/n 


exp 


I 


ne 


»/2 


^ l } 

: y f : .、， 二 ，.•、'，5- ■ i - v 

On the other hand, by using Example 4, Section 6.1, the maximum, L(Ct), Qf 

L(Q) is obtained by replacing 0, and 0 2 by ^ x,/n = x and ^ (x t — xf/n, 
respectively. That is 1 1 




2 n S (x, - x) 2 fn 


"12 


exp 


£ (jc, - x ) 2 
1_ 


ne~ 




i »/2 


Thus here 


1(^-3^ 


ff/2 
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» n 

Because (x, — x) 2 + rix 1 , X may be written 



Now the hypothesis H 0 is = 0,8 2 > 0. If the observed number 5c were zero, 

n 

the experiment tends to confirm H 0 . But if 3F = 0 and > 0, then A = 1. 
On the othei* hand, if x and rix 2 / Yjix, — x) 2 deviate considerably frdm zero, 
the experiment tends to negate H 0 . Now the greater the deviation of 


tv ?— x) 2 from zero, the smaller k becomes. That is, if X is used as a test 


criterion, then an intuitively appealing critical region for testing H 0 is a set 
defined by 0 < A ^ where 々 is a positive proper fraction. Thus we reject 
if A S A test that has the critical region A < Ao is a likelihood ratio test. 
In this example X ^ when and only when 



If H 0 :6\ = 0 is true, the results in Section 4.8 show that the statistic 
t(X t , X 2 ,..., X „)= 


has a /-distribution with n — 1 degrees of freedom. Accordingly, in this 
example the likelihood tatio test of H 0 against 7/, maybe based on a r-statistic. 
For a given positive integers. Table IV in Appendix B may be used (with n — 1 
degrees of freedom) to determine the number c such that 
a = Pr [|((^|, X 2 ,..., A^)! ^ c; H 0 ] is the desired significance level of the test. 
If the experimental values of X ] ,X 2 ,..., X„ are, respectively, .x t , oc 2 » • ■ • ,x„, 
then we reject H 0 if and only if |f(x,, x 2 ,..., jc b )| ^ c. If, for instance, n = 6 
and lx = 0.05, then from Table IV, c = 2.571. 



The preceding example should make the following generaliz¬ 
ation easier to read: Let X [t X 2 ,..., denote n independent ran¬ 
dom variables having, respectively, the probability density functions 
0,., 0 2 ,..., 6 m ), i = 1,2,... ,n. The set that consists of all par¬ 
ameter points (0 U 9 2 ,, 0 m ) is denoted by Q, which we have 
called the parameter space. Let (u be a subset of the parameter 
space Q. We wish to test the (simple or composite) hypothesis 
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Hq : (d u d 2 ,...d m )€ 0 ) against all alternative hypotheses. Define the 
likelihood functions 


It 

L(C0) = Y\ fii X h • • * > (段 I ， 沒 2,… • ， D e ⑴， 

i ®* I 


and 

*, . X 

、 * ， 5 

X(fl) = J~[ 0\ , 02j • • • > ®m)» ( 沒 I ， 02, • • • ， 0/n)^ 

i « I 

Let L{ib) and L{Cl) be the maxima, which we assume to exist, of these 
two likelihood functions. The ratio of L(cb) to L(Cl) is called the 
likelihood ratio and is denoted by 


乂 ( 々， x 2y ...,x n ) = X 


—她 • 


Let ^ be a positive proper function. The likelihood ratio test principle 
states that the hypothesis H 0 : (0,, 0 2 ,. r. ,d m )€(o is rejecteil if and 
only if 


The function X defines a random variable 乂 d X 2 ,..., X n ), and the 
significance level of the test is given by 


ot = Pr [A^i, Xiy ..., X n ) ^ H 0 ], 

The likelihood ratio test principle is an iQtuitive one. However, 
the principle does le^d to the same test, when testing a simple 
hypothesis H 0 against an alternative simple hypothesis H x , as that given 
by the Neyman-Pearson theorem (Exercise 9.25). Thus it might be 
expected that a test based on this principle has some desirable 
properties. 

An example of the preceding generalization will be given. 


Example 

tribution 


ie Z Let the independent random variables X and Y have 


distributions that are N(6 { , and N(6 2 , 0j), where the means 6 t and 6 2 and 
common variance 0 } are unknown. Then £2 = {(0|, d 2 , 0 3 ): — oo < 0, < oo, 
一 oo < < oo,0 < 0 3 < oo}. Let X,, X 2 ,..., X„ and Y t , Y 2 ,..., Y m denote 

independent random samples from these distributions. The hypothesis 
H a : 0, = 0 2 , unspecified, and unspecified, is to be tested against all 
alternatives. Thenco = {(0,, 0 2 , 0 3 ): — oo < 0, = < oo, 0 < 6y < oo}. Here 
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X ]t X 2 ,, X„, Y { , Y 2 ,..., Y m aren + m> 2 mutually independent random 
variables having the likelihood functions 


L(w) 


(i 


、 （ f ? 十 m )/2 


exp 


S (ac, — 0|) 2 + X — ^i) 2 

j_I_ 


and 


/ \ (" + »)/2 1 Z ^ X i _ 沒 1) 2 + Z (乃 _ 沒 2) 2 

構 = 味 ） -P ' 


20, 


If 

5 In Ho)) 5 In Lid) 

and ~mr 

are equated to zero, then (Exercise 9.26) 

t^-e^ + iiy,-6,) = 0, 

I 1 


-(n + m) + 瓦 


X ( x , — 0|) 2 + 艺 Ov — 0|) 2 


0. 


The solutions for 0, and 0 3 are, respectively, 

n m 

u = 

and 


n + m 


S - «) 2 +1 {yt - u) 2 


n 十 m 


and u and w maximize L(co). The maximum is 

v (w + m)y2 




K 


In like manner, if 


5 In L(Q) din L(Q) din L(Q) 


ee t ’ 


se 7 


de. 


(i) 
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are equated to zero, then (Exercise 9.27) 

Z (x, - 0,) = 0, 

I 

f^y' - 02) = 0, 

1 

—(n + w ) + ( x , 一 0 | ) 2 + 罕 Cv , .一 化 ) 2 

The solutions for 0,, 0 2 , and d 3 are, respectively, 

* * \ • 

n 

“ i = V ， 


U 2 




m 


Z ( x i — u \) 2 + Z (yt ~ u 2 ) 2 

〆 =」 - ! - 

n + m 

and Ui, u 2 , and w' maximize L(ft). The maximum is 

L(Cl) 


2nw , 


so that 


又 (X|，..., x„ t , y m ) = X = 

The random variable defined by X 2Kn+m) is 


/\<» + »»)/2 


办 + E(d 


[(nX + m?)/(« + m)]} z + X{K / - [(nX + m P)/(« + m)\ 


Now 




I 


(X - X) 


^ nX + mY^ 


^(X^^ + niX 


n + m 


nX + mY^ 
n + m , 


( 2 ) 
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and 




(y ; -F)+ F 


Kr,- F) 2 + m[ ? 


n + m 


nX + m y 、 
n + m , 


But 


and 


n\X 


nX + mT 
n + m , 


m 2 n 


(n + m) : 


(f _ Yf 


m? ”㈣ 、 


It 


\X~Yf. 


' n + m J ^ (n + m) 2 

i. « f %， » . 

Hence the random variable defined by A 2/< " +w) may be written 




XC^-^ + Kr ,-?) 2 


K^-^ + Kr；- Y) 2 + [nm/(n + m)](X - f) 2 


[nm/(n + m)](X - T) 2 ' 


Y i (X i -X) 2 + l d (Y i -.Y) 2 
• 1 

If the hypothesis H 0 : 6 t = $2 is true, the random variable 


T 


j nm 
n + m 


{X-Y) 


X ( 尤 - 办 + I ( y ,- F > 2 


n + m 


has, in accordance with Section 6.3, a /-distribution with n-\- m — 2 degrees 
of freedom. Thus the random variable defined by is 


n + nt 


f ' (n + m 二 2) + 7 s * 

The test of H 0 against all alternatives may then be based on a /-distribution 
with n + m — 2 degrees of freedom. u " 

The likelihood ratio principle calls for the rejection of H 0 if and only if 
A < /io < 1. Thus the significance level of the test is 

a 二 Pr , X n , l^i,, Y m ) < Ao ； Ho]- 
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However, , X„, Y lt ..., Y m ) : S Aq is equivalent to 171 ^ c, and so 

、 a = Pr (|71 ^ // 0 ). 

For given values of n and m, the number c is determined from Table IV in 
Appendix B (with n + m — 2 degrees of freedom) in such a manner as to yield 
a desired a. Then H 0 is rejected at a significance level a if and only if |/| > c, 
where t is the experimental value of T. If, for instance, n = 10, m = 6, and 
a = 0.05, then c = 2.145. 

In each of the two examples of this section it was found that the 
likelihood ratio test could be based on a statistic which, when the 
hypothesis H 0 is true, has a /-distribution. To help us compute the 
powers of these tests at parameter points other thaii those described 
by the hypothesis H 0 , we turn to the following definition. 

Definition 3. Let the random variable Wbe N(5, 1); let the random 
variable K be x 2 (f), and and K be independent. The quotient 

T _ W 

■ 一抽 

is said to have a noncentral t-distribution with r degrees of freedom and 
noncentrality parameter S. If 5 = 0， we say that T has a central 
^-distribution. 


In the light of this definition, let us reexamine the statistics of the 
examples of this section. In Example 1 we had 

JnX 

，…， JT") = V _ 

y/nX/a 




Here 1) ， F,= 全 ( 足 -is z 2 (” — 认 

I 

and W\ and V, are independent. Thus, if 6, ^ 0, we see, in accordance 
with the definition, that t(X t ,..., X„) has a noncentral /-distribution 
with n — l degrees of freedom and noncentrality parameter 
A = \fn 0 x l<J. In Example 2 we had , 

W 2 v 


T 


V V 2 /(n + m - 2) ’ 
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YiX^Xf + Y^-Y) 2 
v 2 = -- ! - 

Here W 2 is N[y/nml(rt + m)(0, — 0 2 )j(r, 1], F 2 is m — 2)，and W 2 
and V 2 are independent. Accordingly, if 0, ^ 0 2 , T has a noncentral 
/-distribution with n + m —2 degrees of freedom and noncentrality 
parameter S 2 = yjnmjin + m)(0, — 0 2 )/^. It is interesting to note that 
Si = y/n 9 i/(t measures the deviation of from 0, = 0 in units of the 
standard deviation ojy/n of X. The noncentrality parameter 
^2 = yjnmjin + m)($, — 6 2 )/(t is equal to the deviation of 0 X — 0 2 from 
0 { ― 0 2 = Oin units of the standard deviation a^Jin + m)/nm o(X — Y. 

There are various tables of the noncentral /-distribution, but they 
are much too cumbersome to be included in this book. However, with 
the aid of such tables, we can determine the power functions of these 
tests as functions of the noncentrality parameters. 

In Example 2, in testing the equality of the means of two normal 
distributions, it was assumed that the unknown variances of the 
distributions were equal. Let us now consider the problem of testing 
the equality of these two unknown variances. 

Example 3. We are given the independent random samples ... ,X„ 
and Y u ... ,Y m from the distributions, which are N(6i, 6 3 ) and N(6 2 ,0 A ), 
respectively. We have 

fl = {(0|, d 2 , 03 , 0 4 ): —00 < 0,, 0 2 < °°» 0 < d 3 , d 4 < 00}. 

The hypothesis H 0 :d 3 = 0 4 , unspecified, with and d 2 also unspecified, is to 
be tested against all alternatives. Then 

CO = {(0,, d 2 , 03 , 0 4 ) : —00 < 0|， < 00, 0 < = 04 < oo}. 

It is easy to show (see Exercise 9.30) that the statistic defined by A = L((b)/L(Cl) 
is a function of the statistic 

F=-^ - - - • 

Y) 2 l(m - 1 ) 

1 
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If 8 3 = d 4 , this statistic F has an F-distribution with n — 1 and m — 1 degrees 
of freedom. The hypothesis that (0,, 0 2 , d 4 ) e co is rejected if the computed 
F < C| or if the computed F > c 2 . The constants c, and c 2 are usually selected 
so that, if d 3 = 0 4 , 

Pr(/ , <c 1 ) = Pr(F>c 2 ) = |, 

where a, is the desired significance level of this test. 

Often, under H 0 , it is difficult to determine the distribution of 
X = k{X u X 2 ,..., X„) or the distribution of an equivalent statistic 
upon which to base the likelihood ratio test. Hence it is impossible to 
find Aq such that Pr [A < A(,; H 0 ] equals an appropriate value of a. The 
fact that the maximum likelihood estimators in a regular case have a 
joint normal distribution does, however, provide a solution. Using this 
fact, in a more advanced course, we can show that —2 In A has, given 
H q is true, an approximate chi-square distribution with r degrees of 
freedom, where r = the dimension of Q — the dimension of w. For 
illustration, in Example 1, the dimension of Q = 2 and the dimension 
of ca = 1 and r = 2 — 1 = 1. 

Also, in that example, note that 

—2 In A = /z In {l 4- ^ n - X — } = n In 
1 I (^ f - xf\ 

Hence, with rt large so that x 2 /^ is close to zero under H o :0i = 0, let 
us approximate the right-hand member by two terras of. a Taylor’s 
series expanded about zero: 

— 2 In A w 0 + . 

sr 

Since n is large, we can replace n by w — 1 to get the approximation 

-2\nX^( — r — ) =t 2 - 

Vs/x/w- 1/ 

But T= X/iS/^/n — 1) under H Q :9 l = 0 has a /-distribution with 
n — \ degrees of freedom. Moreover, with large n — 1, the distribution 
of Tis approximately N(0, 1) and the square of a standardized normal 
variable is x 2 (l), which is in agreement with the stated result. Exercise 
9.31 provides another illustration of the fact that —2 In A has an 
approximate chi-square distribution. 
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EXERCISES 

9.23. In Example 1 let n = 10, and let the experimental values of the random 

_ 10 一、 * 

variables yield x = 0.6 and ^ (x, — 3c) 2 = 3.6. If the test derived in that 

I 

example is used, do we accept or reject //„ ： 0, = 0 at the 5 percent 
significance level? 

_ 8 

9.24. In Example 2 let « = m = 8, 3c = 75.2, y = 78.6, ^ (x, — x) 2 = 71.2, 

8 I 

^ (y, — y) 2 = 54.8. If we use the test derived in that example, do we accept 

I 

or reject H Q :6 X = d 2 at the 5 percent significance level? 


9.25. Show that the likelihood ratio principle leads to the same test, when 
testing a simple hypothesis H 0 against an alternative simple hypothesis H \, 
as that given by the Neyman-Pearson theorem. Note that there are only 
two points in £1. 

9.26. Verify Equations (1) of Example 2 of this section. 

9.27. Verify Equations (2) of Example 2 of this section. 

9.28. Let Xi, X 2 ,..., X„bc random sample from the normal distribution 
N(6 f 1). Show that the likelihood ratio principle for testing H 0 :0 = O', 
where 0' is specified, against H t :6 ^ 6' leads to the inequality |3c - 0 r \ > c. 
Is this a uniformly most powerful test of H 0 against //,? 


9.29. Let Xi, X 2 ,..., X„be & random sample from the normal distribution 
N(6 t , d 2 ). Show that the likelihood ratio principle for testing H 0 :8 2 = 
specified, and 0, unspecified, against H i :6 2 ¥ : 0' 2 , d t unspecified, leads to a 

n n 

test that rejects when ^ (x, — 3c) 2 ^ c, or ^ (x, — 3c) 2 > c 2 , where c, < c 2 


are selected appropriately. 

9.30. Let X t ,... ,X„ and Y lf ..., Y m be independent random samples from 
the distributions N(6i, 0 3 ) and N(8 2 , 0 A ), respectively. 

(a) Show that the likelihood ratio for testing H 0 : 0, = 0 2 , By = 6 4 against 
all alternatives is given by 


£ (x, - x) 2 /n 


LCV/- yf/m 


m/2 


I(x, - M ) 2 + E (y t - uf 


(m + n) 


where u = (nx + my)[(n + m). 
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(b) Show that the likelihood ratio test for testing // 0 ： 0 3 = d 4 , 8, and d 2 
unspecified, against /T, : 0 3 # d 4 , 0, and d 2 unspecified, can be based on 
the random variable 


J_ 

t( Y '~ >0 2 /(m - 1) 


(c) If d 3 = 6 4 , argue that the F-statistic in part (b) is independent of the 
r-statistic of Example 2 of this section. 

9.31. Let n independent trials of an experiment be such that jc,, jc 2 , .,., x*are 
the respective numbers of times that the experiment ends in the mutually 
exclusive and exhaustive events A u A 2 ,..., A k . If p, = P(Ai) is constant 
throughout the n trials, then the probability of that particular sequence of 
trials is L : = pf'PP -''Pk k - 

(a) Recalling that Pi + + ...+/>*= 1 ， show that the likelihood ratio for 

testing H 0 : p, = p iQ > 0, / = \,2,... ,k, against all alternatives is given 
by 


,A 


(b) Show that 


— 2 In A = 


f x ,( x , - n Poi 、 2 

f -1 (npi) 1 


where p] is between p 0i and Xj/rt. 

Hint: Expand In p i0 in a Taylor’s series with the remainder in the 
term involving (/? /0 — jc,/«) 2 . 

(c) For large n, argue that x,/(npi) 2 is approximated by l/(np l0 ) and hence 
— 2 In A « Y ——— 即 0, ) , when H 0 is true. 

/ m, 


In Section 6.6 we said the right-hand member of this last equation 
defines a statistic that has an approximate chi-square distribution 
with k — 1 degrees of freedom. Note that 

dimension of ft — dimension ofa; = (/r— 1) — 0 = /: — 1. 

9.32. Let Y, < Y 2 < • • • < Y 5 be the order statistics of a random sample of 

size n = 5 from a distribution with p.d.f./(jc; 6) = ' — oo < x < oo, 

for all real 6. Find the likelihood ratio test 乂 for testing H 0 : 6 = 6 0 against 

n t ：d ¥= d Q . 

9.33. Let A*!, X 2 ,..., A^and Y lf Y 2 ,, Y m be independent random samples 
from the two normal distributions iV(0, 0,) and ^V(0, d 2 ). 
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(a) Find the likelihood ratio A for testing the composite hypothesis 
H 0 :6 l = 0 2 against the composite alternative H X \6 X ^ 0 2 . 

(b) This 乂 'is a function of what F-statistic that would actually be used in 
this test? 


9.34. A random sample X u X 2 , 

or 


..,X„ arises from a distribution given by 
0 <x <6, zero elsewhere, 


: f(x; 0) = \ e~ x/9 t 0 < x < co, zero elsewhere. 

0 

Determine the likelihood ratio (A) test associated with the test of ff 0 against 

9.35. Let X and Y be two independent random variables with respective 
probability density functions 

/ ⑽ = 0<x<oo, 

zero elsewhere, i = 1,2. To test H 0 : = 0 2 against //, : 0 X ^ two 

independent random samples of sizes /i, and n 2 , respectively, were taken 
from these distributions. Find the likelihood ratio A and show that A can 
be written as a function of a statistic having an F-distribution, under H 0 . 

9.36. Consider the two uniform distributions with respective probability 
density functions 


fix; 6i) = , -6,<x<d h 

zero elsewhere, i = 1,2. The null hypothesis is H 0 \6\= d 2 while 
the alternative is H X \Q X ^ 0 2 . Let < X 2 < ' • ■ < and 

1^! < y 2 < - < Y„ 2 be the order statistics of two independent random 

samples from the two distributions, respectively. Using the likelihood 
ratio X, find the statistic used to test H 0 against H t . Find the distribution 
of —2 In X when H 0 is true. Note that in this nonregular case the number 
of degrees of freedom is two times the difference of the dimensions of Q 
and (o. 


9.4 The Sequential Probability Ratio Test 

In Section 9.1 we proved a theorem that provided us with a 
method for determining a best critical region for testing a simple 
hypothesis against an alternative simple hypothesis. The theorem was 
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as follows. Let X lf X 2 ,..., X„ be a random sample with fixed 
sample size n from a distribution that has p.d.f. f(x; 0), where 
6e{6 :0 = 6 f , 8"} and Q' and 9" are known numbers. Let the joint p.d.f. 
of JT,, ..., be denoted by 

L(e,n)=f(x l ;6)f(x 2 -e)--f(x n -0) > 


a notation that reveals both the parameter 0 and the sample size n. If 
we reject H 0 : G = 0’ and accept //,: 0 = 6" when and only when 

L(0\ ri) 


ue\ n) 


< k. 


where k > Q, then this is a best test of H 0 against H t . 

Let us now suppose that the sample size n is not fixed in advance. 
In fact, let the sample size be a random variable N with sample space 
{n:n = 1, 2, 3,...}. An interesting procedure for testing the simple 
hypothesis H 0 : 9 = 6' against the simple hypothesis if, ： 0 = 0" is 
the following. Let k 0 and /c, be two positive constants with k 0 < k x . 
Observe the independent outcomes X t , X 2 , ... in sequence, say 

x lt x 2 , jc 3 , ..., and compute 

L(e\ 1) L(0\ 2) L(9\ 3) 

L(r, 1) 5 攀， 2) ’ L{9\ 3) ，… . 

The hypothesis H 0 :9 = 6 r is rejected (and H x \6 = 9" is accepted) if 
and only if there exists a positive integer n so that (jci, x 2 ,..., x„) 
belongs to the set 


C„ = 


(•X|，• _ . ， x n ) . /to < 


W\j) 

WJ) 


<ki，j = 1，…， /I - 1 ， 


and 


L(0\ n) 
n) 


^ ^0 I 


On the other hand, the hypothesis H 0 : 6 = 9 f is accepted (and 
H t : 6 = 0" is rejected) if and only if there exists a positive integer n so 
that (々 , x 2 ,... 7 jc „) belongs to the set 


…， xj : k 0 < 


U0\J) 


<灸1，_/ = 1, 2,. . ., w — 1， 


and 


L{0\ ») 、 J 

L{9'\ n) - k] y 
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That is, we continue to observe sample observations as long as 

We stop these observations in one of two ways: 

1. With rejection of If 0 : 0 = 0’ as soon as 

L(0\ n) 


UO", n) 


S k 0 . 


or 


2. with acceptance of ff 0 : 9 = 0 r as soon as 

L(0\ n) 


W", n) 


之 k、. 


( 1 ) 


A test of this kind is called Wald’s sequential probability ratio test. 
Now, frequently inequality (1) can be conveniently expressed in an 
equivalent form 


c 0 (n) < u(x u x 2 , ...,x n )< c } (n), 

~ t f 

where u(X t , X 2 ,..., X„) is a statistic and c 0 («). and c, («) depend on the 
constants k'U ”，and on n. Then the observations are stopped 
and a decision is reached as soon as 

f • ' i» 

，又 2 ，、 • * ，义 ^ Co(”） or “(义|，文 2 ， • • * » — 

We now give an illustrative example. 

Example 1. Let X have a p.d.f. 

fix ； e) = -ey -\ x = o,i. 


= 0 elsewhere. 

In the preceding discussion of a sequential probability ratio test，let // 0 : 0 = | 
and Hy：d = ]; then, with 




If we take logarithms to the base 2, the inequality 


k 0 < 


U\,n) 


<女1， 
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Note that L(\, n)/L(|, n)^!^ if and only if Ci(n) < x,; and L(j, n)/ 

» i 

L(§, n) > k x if and only if c 0 (n) > Y, x i- Thus we continue to observe 

" I 

outcomes as long as c 0 (n) < [x, < Ci(/i). The observation of outcomes is 

I » 

discontinued with the first value n of N for which either c,(«) ^ [x, or 

n n 1 

^o(«) ^ Z x f The inequality C|(«) < leads to the rejection of N 0 : 0 = j 

I i « 

(the acceptance of If,), and the inequality c 0 (n) ^ 2^, x f leads to the acceptance 

I 

of // 0 : 6 = j (the rejection of if,). 

Remarks. At this point, the reader undoubtedly sees that there are many 
questions that should be raised in connection with the sequential probability 
ratio test. Some of these questions are possibly among the following: 

1. What is the probability of the procedure continuing indefinitely? 

2. What is the value of the power function of this test at each of the points 
0 = 0' and 0 = 0"? 

3. If &' is one of several values of 0 specified by an alternative composite 
hypothesis, say H\\Q> 9\ what is the power function at each point 0^0'? 

4. Since the sample size is a random variable, what are some of the 
properties of the distribution of TV? In particular, what is the expected value 
E(N) of TV? 

5. How does this test compare with tests that have a fixed sample size n? 

A course in sequential analysis would investigate these and many other 
problems. However, in this book our objective is largely that of acquainting 
the reader with this kind of test procedure. Accordingly, we assert that the 
answer to question 1 is zero. Moreover, it can be proved that if 0 = 0' or if 
0 = 6'\ E(N) is smaller, for this sequential procedure, than the sample size of 
a fixed-sample-size test which has the same values of the power function at 
those points. We now consider question 2 in some detail. 

In this section we shall denote the power of the test when H 0 is 


with 0 < < k u becomes 

Jl 

log 2 k fi <n-2Y J x i < log 2 fe,, 

I 

or, equivalently, 


- 

n-2 

< 

it 

< 

I 

1 -2 
I 
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true by the symbol a and the power of the test when H ] is true by the 
symbol 1 — p. Thus a is the probability of committing a type I error 
(the rejection of H 0 when H 0 is true), and ^ is the probability of 
committing a type II error (the acceptance of H 0 when If 0 is false). With 
the sets C„ and B„ as previously defined, and with random variables of 
the continuous type, we then have 


a = X L{e\ n) 9 

c„ 



L(e\ ny 


Since the probability is 1 that the procedure will terminate, we also 
have 


oo 

-«=I 


L(B\ n). 


S 


B n 


L{e\ n). 


B” 


If (x, ， x 2 ,, x„) e C H , we have L{6\ n) < k 0 L(9", n)\ hence it is clear 
that 


flO 




C, 


k 0 L{e\n) = k 0 (\-P). 


c„ 


Because L{9\ n) ^ k\L{9'\ n) at each point of the set B„, we have 

k x Li6'\ n) = kifi. 


00 


<J0 


l-«= I n) > X 

Accordingly, it follows that 

a 


B n 




^ ^0* ^1 ^ 


a 


⑵ 


provided that is not equal to zero or 1. 

Now let a a and 凡 be preassigned proper fractions; some typical 
values in the applications are 0.01, 0.05, and 0.10. If we take 


ko 




1 - Po 

then inequalities (2) become 


ki 




P 0 


a 






-m 


— 

17 




a 


( 3 ) 


or, equivalently, 


a (l — Pa) ^ (1 — P)a a ， P(\ — a 0 ) < (1 — oi)P a . 
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If we add corresponding members of the immediately preceding 
inequalities, we find that 

a + fi - afi a — Pa a <a a + p a - fia a - afi a 

and hence 

a + p < ： <x a + jS fl . 

That is, the sum a + jS of the probabilities of the two kinds of errors 
is bounded above by the sum a a + of the preassigned numbers. 
Moreover, since a and p are positive proper fractions, inequalities (3) 
imply that 


a < 


Cta 




Po 


1 — Pa 1 _ 

consequently, we have an upper bound on each of a and p. Various 
investigations of the sequential probability ratio test seem to indicate 


that in most practical cases, the values of a and P are quite close to a a 
and p a . This prompts us to approximate the power function at the 
points 0 = 6’ and 0 = 0" by <x a and 1 — p a , respectively. 


Example 2. Let Jfbe N(9, 100). To find the sequential probability ratio test 
for testing H 0 :9 = 15 against H t :9 = IS such that each of a and is 
approximately equal to 0:10, take 、 

. 0.10 1 , 1 - 0.10 _ 

^ = r=no = 9 5 k ^~oio~ =9 - 

Since 


L(75, n) _ exp[-£(JC, -75) 2 /2(100)] 
^78, n) ~ exp [-X (x t - 78) 2 /2(100)] 


exp 


6 L x,. — 459 ”、 
200 


the inequality 


, 0 = l <^< 9 = , 1 


9 L(78, n) 

can be rewritten, by taking logarithms, as 

6 V x, — 459n 

-ln9< —— <ln 9. 

200 

This inequality is equivalent to the inequality 

c 0 (n) = ^ In 9 < E x, < 爭 ” + 手 In 9 = C|(«). 

Moreover, £(75, n)/L(78, n) ^ and L(75, n)/L(78, n)>ki are equivalent 
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n n 

to the inequalities x, > c,(w) and x t < c 0 (/i), respectively. Thus the 


i i 

observation of outcomes is discontinued with the first value n of ^ for which 


either ^ Xj k c,(/i) or ^ jc, < c 0 («). The inequality ^ c t (n) leads to the 

• I 1 n I 

rejection of H 0 :6 = 75, and the inequality <, c 0 (rt) leads to the acceptance 


of H 0 :d = 75. The power of the test is approximately 0.10 when H Q is true, 
and approximately 0.90 when H\ is true. 


Remark. It is interesting to note that a sequential probability ratio test can 
be thought of as a random^walk procedure. For illustrations, the final 
inequalities of Examples 1 and 2 can be rewritten as 

n 

—log 2 < X 2(_x,- — 0.5) < — log 2 k 0 

I 

and 


100, Q A, 100, Q 

--=-In 9 < 2] (Xj — 76.5) < In 9, 
j I j 

respectively. In each instance, we can think of starting at the point zero and 
taking random steps until one of the boundaries is reached. In the first 
situation the random steps are 2(X, — 0.5), 2(X 7 — 0.5), 2(X 3 — 0.5),... and 
hence are of the same length, 1, but with random directions. In the second 
instance, both the length and the direction of the steps are random variables, 
X y - 76.5, X 2 - 76.5, JIT] - 76.5, .… 


In recent years, there has been much attention to improving quality 
of products using statistical methods. One such simple method was 
developed by Walter Shewhart in which a sample of size n of the items 
being produced is taken and they are measured, resulting in n values. 
The mean 3c of these n measurements has an approximate normal 
distribution with mean /x and variance a 2 jn. In practice, /z and a 1 must 
be estimated, but in this discussion, we assume that they are known. 
From theory we know that the probability is 0.997 that x is between 


LCL = /z — 


3a 


and 


UCL = 只 + 与 . 

V n 


These two values are called the lower (LCL) and upper (UCL) control 
limits, respectively. Samples like this are taken periodically, resulting 
in a sequence of means, say x t ,x 2 , 3c 3 ,.... These are usually plotted; 
and if they are between the LCL and UCL, we say that the process 
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is in control. If one falls outside the limits, this would suggest that the 
mean fi has shifted, and the process would be investigated. 

It was recognized by some that there could be a shift in the mean, 
say from fito n + (a/y/n); and it would still be difficult to detect that 
shift with a single sample mean as now the probability of a single x 
exceeding UCL is only about 0.023. This means that we would need 
about 1/0.023 « 43 samples, each of size on the average before 
detecting such a shift. This seems too long; so statisticians recognized 
that they should be cumulating experience as the sequence 
x\, 3 c 2 , ... is observed in order to help them detect the shift sooner. 


It is the practice to compute the standardized variable Z = {X — fi)/(tr/ 
y/n)\ thus we state the problem in these terms and provide the solution 


given by a sequential probability ratio test. 

Here Z is N(6, 1), and we wish to test H 0 -6 = 0 against H\：0 = \ 
using the sequence of i.i.d. random variables Z,, Z 2 ,. .., Z m , ■… We 
use m rather than n, as the latter is the size of the samples taken 
periodically. We have 

L(0, m) exp[-^z?/2] 

1( 1 ， m) exp [-X (Zi - \) 2 /2] 


—Z (z, • — 0.5) 

f *= 1 


Thus 


k 0 < exp 


一丈 (z, • — 0.5) 


< k\ 


can be rewritten as 

h — — In > S ( z i ~~ 0.5) > —In k x = —h. 


It is true that —In k Q =\nk l when a fl = p a . Often, h = — In 〜is taken 
to be about 4 or 5, suggesting that a 0 = fi a is small, like 0.01. As 
T (z, — 0.5) is cumulating the sum of z, — 0.5, i = 1, 2, 3,..., these 
procedures are often called CUSUMS. If the CUSUM = S (z,- — 0.5) 
exceeds h, we would investigate the process, as it seems that the mean 
has shifted upward. If this shift is to 0 = 1, the theory associated with 
these procedures shows that we need only 8 or 9 samples on the average, 
rather than 43, to detect this shift. For more information about these 
methods, the reader is referred to one of the many books on quality 
improvement through statistical methods. What we would like to 
emphasize here is that, through sequential methods (not only the 
sequential probability ratio test), we should take advantage of all past 
experience that we can gather in making inferences. 
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EXERCISES 

9.37. Let X be N(0, 6) and, in the notation of this section, let 6' = 4, 6" = 9, 
a a = 0.05, and = 0.10. Show that the sequential probability ratio test can 

n 

be based upon the statistic Y, Determine c 0 (n) and c^n). 

I 

9.38. Let X have a Poisson distribution with mean 9. Find the sequential 
probability ratio test for testing H 0 : 6 = 0.02 against H, : 6 = 0.07. Show 

n 

that this test can be based upon the statistic ^ X,. If <x d = 0.20 and = 0.10, 

, j 

find c 0 (n) and C|(n). 

9.39. Let the independent random variables Y and Z be NQii, V) and N(pi 2 , 1), 

respectively. Let 0 = /i, — - Let us observe independent observations 

from each distribution, say Y t , Y 2 ,... and Z,, Z 2 ,... .To test sequentially 
the hypothesis H 0 :6 = 0 against = use the sequence X t = Y t — Z h 
i_= 1,_2,.. If a a = p a = 0.05, show that the test can be based upon 
X = Y — Z. Find c 0 (n) and c x (n). 

9.40. Say that a manufacturing process makes about 3 percent defective 
items, which is considered satisfactory for this particular product. The 
managers would like to decrease this to about 1 percent and clearly want 
to guard against a substantial increase, say to 5 percent. To monitor the 
process, periodically n = 100 items are taken and the number of defectives 
counted. Assume that X is bin = 100,/? = 9). Based on a sequence 

X 2 ,..., X„,. . ., determine a sequential probability ratio test that 
tests H 0 : 9 = 0.01 against Hi ： 9 = 0.05. (Note that 9 = 0.03, the present 
level, is in between these two values.) Write this test in the form 

m 

方 0 > E ( 々 -nd)>hy 

/= I 

and determine d, h^, and if <x a = p a = 0.02. 

9.5 Minimax, Bayesian, and Classification Procedures 

In Chapters 7 and 8 we considered several procedures which may 
be used in problems of point estimation. Among these were decision 
function procedures (in particular, minimax decisions) and Bayesian 
procedures. In this section, we apply these same principles to the 
problem of testing a simple hypothesis H 0 against an alternative simple 
hypothesis //,. It is important to observe that each of these procedures 
yields, in accordance with the Neyman-Pearson theorem, a best test 
of Hq against . 
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We first investigate the decision function approach to the problem 
of testing a simple hypothesis against a simple alternative hypothesis. 
Let the joint p.d.f. of « random variables , X„ depend upon 

the parameter 9. Here n is a fixed positive integer. This p.d.f. is denoted 
by L(9; x u x 2 ,, x„) or, for brevity, by L{6). Let 6 r and 6" be 
distinct and fixed values of 0. We wish to test the simple hypothesis 
H q : 0 = 0’ against the simple hypothesis H } : 0 = 0". Thus the 
parameter space is Q = {0 9 = 9\ 0"}. In accordance with the decision 
function procedure, we need a function 3 of the observed values of 
X,,... ,X„ (or, of the observed value of a statistic Y) that decides which 
of the two values of 0, or 9", to accept. That is, the 
function 5 selects either H 0 : 6 = 6 f or H x : 9 = 9". We denote these 
decisions by ^ = 0' and 5 = B", respectively. Let 父 (0 ， d) represent the 
loss function associated with this decision problem. Because the pairs 
(0 = 0’，d = 0’) and (0 = 6'\ d = 6") represent correct decisions, we 
shall always take O') = 9") = 0. On the other hand, if 

either 3 = 0" when 9 = 9 r ox d = 9' when 6 = 9", then a positive value 
should be assigned to the loss function; that is, ^{0\ 0") > 0 and 

e') > o. 

It has previously been emphasized that a test of H 0 : 0 = 0’ against 
Hi ： 6 = 6" can be described in terms of a critical region in the sample 
space. We can do the same kind of thing with the decision function. 
That is, we can choose a subset C of the sample space and if 
(x,, x 2 ,..., x„) e C, we cap make the decision d = 9"; whereas, if 
(x,, x 2 ,. .., x„)6..C*, the complement of C, we make the decision 
5 = 0\ Thus a given critical region C determines the decision function. 
In this sense, we may denote the risk function by R(0, Q instead of 
R(0, 5). That is, in a notation used in Section 9.1, 

/• 

R(9, Q = R(9, 6)= 父 (0, b)L{Q). 

Since 5 if (x,,..., x„) e C and d = O'if (x,.6 C*, we have 

R(e, q = e")L{e) + e r )ue). ( 1 ) 

Jc* 

If, in Equation (1), we take 9 = 6', then 9 r ) = 0 and hence 

R(6\Q= f e ,, )UQ , ) = 0") f L(6 , ). 
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On the other hand, if in Equation (1) we let 9 = B", then Sf(0 f ’，『）= Q 
and ， accordingly, 

f /• 

R(0", o = 雕， e^LiO")= 卵 ” ， a) ue"). 


It is enlightening to note that, if K(6) is the power function of the test 
associated with the critical region C, then 

R(9\ Q = 輝， 6 ,, )K(6 , ) = 0")(x, 

where a = K{Q r ) is the significance level; and 

R{6\ C)= 輝 , ， 0')[1 - K(d")]= 釋 " ，轉， 


where = l — K{B") is the probability of the type II error. 

Let us now see if we can find a minimax solution to our problem. 
That is, we want to find a critical region C so that 

max [W ， C) ， R(0\ Q] 


is minimized. We shall show that the solution is the region 


C = 《 Ox 】， •. • ， x n ): 


U0 f ； x u ...,x„) 






<k 


provided the positive constant k is selected so that R(6\ C) = R(6", C). 
That is, if k is chosen so that 

>» a 

d") L(0') = S£\ 00 L(9"), 


then the critical region C provides a minimax solution. In the case of 
random variables of the continuous type, k can always be selected so 
that R(0\ C) = R(0", C). However, with random variables of the 
discrete type, we may need to consider an auxiliary random exper¬ 
iment when L(6 , )/L(0 /, ) = k in order to achieve the exact equality 
R(0\ Q = R(9\ Q. 

To see that this region C is the minimax solution, consider every 
other region^ for which R(9\ C) ^ R(0\ A). Obviously, a region A for 
which Rid', Q < R(0\ A) is not a candidate for a minimax solution, 
for then R{Q\ Q = 及 (0 "， C) < max [R(0\ A), R(9\ J)]. Since 
R(0\ C) > R{0\ A) means that 

^(6\ 6") f L{e r ) ^ ^(0\0") f L(0 ')， 
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we have 


a 




c 


L(er 


That is, the significance level of the test associated with the critical 
region A is less than or equal to a. But C, in accordance with the 
Neyman-Pearson theorem, is a best critical region of size a. Thus 


L(d") ^ 


c 


L(0") 


and 


W) < 


c* 


US"). 


A* 


Accordingly, 

or, equivalently, 


l(9") ^ &) 


c* 




R(e\ q < R(e\ a). 

That is, 

R(0\ Q = R(9\ Q < R(0\ A). 

This means that 

max Q ， R(0\Q]^ R(9", A). 

Then certainly, 

max [R(e\ C), R(9", C)] < max A), R(d\ A)\, 

and the critical region C provides a minimax solution, as we wanted 
to show. 

Example 1. Let X 2 ,..., X m denote a random sample of size 100 from 
a distribution that is N(0, 100). We again consider the problem of testing 
H^:Q = 15 against H X \Q= 78. We seek a minimax solution with 
if(75, 78) = 3 and 义 (78, 75)= 1. Since L(75)/L(78) < A ： is equivalent to 
x 乏 c ， we want to determine c, and thus k, so that 

3 Pr (f > c; 0 = 75) = Pr(X<c;d = 78). 
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Because X is N(6, 1), the preceding equation can be rewritten as 

3[1 -<Hc- 75)] = 0(c - 78). 

If we use Table III of the appendix, we see, by trial and error, that the 
solution is c = 76.8, approximately. The significance level of the test is 
1 — 0(1.8) = 0.036, approximately, and the power of the test when H x is true 
is 1 — 4>( — 1.2) = 0.885, approximately. 

Next, let us consider the Bayesian approach to the problem of 
testing the simple hypothesis H 0 : 9 = O' against the simple hypothesis 
H { : 0 = 6". We continue to use the notation already presented in this 
section. In addition, we recall that we need the p.d.f. h(0) of the random 
variable 0. Since the parameter space consists of but two points, & 
and 9", 0 is a random variable of the discrete type; and we have 
h(6’）+ h(9") = 1 . Sinqe 1X9; x, ， jc 2 , …，; t”) = L(0) is the conditional 
p.d.f. of A",, X 2 y ..., X„ y given @ = 0, the joint p.d.f. of X u X 2 ,..., X n 
and © is 

h{Q)L{6\ x u x 2 ,. ..,x n ) = h{&)L{0). 

Because 

I K&)ixe) = h(0 f )L(e f ) + h{e")ue") 

n 


is the marginal p.d.f. o( X U X 2 ,, X nf the conditional p.d.f. of 
given Xi = x u .. ., X H = x„, is 


…， x") 


mm 


h(e / )L(6 / ) + h{Q")Ue") 


Now a Bayes’ solution to a decision problem is defined in Section 
8.1 as a such that E{^[d, S(y)]\Y = y} is a minimum. In this 
problem if ^ the conditional expectation of JSf(0, 6), given 
I = -^i, • • •, = x„, is 


Z 翊， ^ f )K0\xi, ...,x n ) = 

n 


輝，， e^hie^ue") 

h{e f )ue r ) + ko h )l(0") 


because O') = 0; and if 3 = 0'\ this expectation is 

釋， e n )h{B , )Lie , ) 


X 义(0，0〃从(0|々， …， A ) 
n 


h{e f )L{e f ) + h(e ,f )L(0 fr ) 
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because ^(6", d") =>= 0. Accordingly, the Bayes’ solution requires that 
the decision 5 = 9" be made if 

e ,, )h(0 , )L(e f ) ^ 讀 ' e^hie")^") 

- h{6 / )L(e , ) + h(0")L(e") < h(e , )L(9 , ) + h(d")L{e n ) ' 

or, equivalently, if 

W ) 聯"， r ) 零) (2 

If the sign of inequality in expression (2) is reversed, we make the 
decision ^ = 0"; and if the two members of expression (2) are equal, we 
can use some auxiliary random experiment to make the decision. It is 
important to note that expression (2) describes, in accordance with the 
Neyman-Pearson theoreiil, a best test. 

Example 2. In addition to the information ^iveh in Example 1, suppose 
that we know the prior probabilities for & = Q' = 15 and for 0 = 0" = 78 to 
be given, respectively, by h(15) = \ and A(78)= 今 . Then the Bayes’ solution is, 
in this case, 

耶 )‘ (l)(f) _ 

W) < (3Ki) = ' 

which is equivalent to x > 76.3, approximately. The power of the test when 
H 0 is true is 1 — <P(1.3) = 0.097, approximately, and the power of the test when 
H t is true is 1 — ®( — 1.7) = <P(1,7) = 0.955, approximately. 


In summary, we make the following comments. In testing the 
simple hypothesis H o :0 = 0'against the simple hypothesis//, : 0 — 6", 
it is emphasized that each principle leads to critical regions of the form 



L{0"\ x,,..., x„) 



where kisa positive constant. In the classical approach, we determine 
k by requiring that the power function of the test have a certain value 
at the point 0 = B' or at the point 0 = 0" (usually, the value a at the 
point 0 = 0 f ). The minimax decision requires /c to be selected so that 

/* . 

9") L(6 f )= 輝 , ， r) 

Finally, the Bayes’ procedure requires that 

0^(0") 

= 沒 w ，0” h (0 丫 
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Each of these tests is a best test for testing a simple hypothesis 
H 0 : 6 = 6' against a simple alternative hypothesis H\：6 = Q". 

The summary above has an interesting application to the problem 
of classification, which can be described as follows. An investigator 
makes a number of measurements on an item and wants to place it into 
one of several categories (or classify it). For convenience in our 
discussion, we assume that only two measurements, say X and Y, are 
made on the item to be classified. Moreover, let X and 7 have a joint 
p.d.f. /(x, y; 0), where the parameter 0 represents one or more 
parameters. In our simplification, suppose that there are only two 
possible joint distributions (categories) for I and Y, which are indexed 
by the parameter values and 0", respectively. In this case, the problem 
then reduces to one of observing X = x and Y = y and 
then testing the hypothesis 0 = 0' against the hypothesis 0 = 0 "，with 
the classificatidil of X and y being in accord with which hypothesis is 
accepted. From the Neyman-Pearson theorem, we know that a best 
decision of this sort is of the Form: If 

J(x,y ； 0 , ) ^ K 
Ax, r) ~ k, 

choose the distribution indexed by 0"; that is, we classify (x, as 
coming from the distribution indexed by 6". Otherwise, choose the 
distribution indexed by that is, we classify (jc, y) as coming from the 
distribution indexed by O'. Here ^ can be selected by considering the 
power function, a minimax decision, or a Bayes’ procedure. We favor 
the latter if the losses and prior probabilities are known. ... 

Example 3. Let (x, j) be an observation of the random pair (A", Y), which 
has a bivariate normal distribution with parameters pi', /i 2 » and p. In 
Section 3.5 that joint p.d.f. is given by 


y; n^fi 2 , ^,<r 2 2t p) 


2na,a 2s /l - p 2 




00 < X < 00, — 00 < J < GO, 


where a, > 0, a 2 > 0, —1 < p < 1, and 


g(x, 


l Y 


_ 2p 




)(^) 


f y -^2 


Assume that <r\, u\, and p are known but that we do not know whether the 
respective means of (X, Y) are ^ 2 ) or 04, n'{). The inequality 

/(w; ， /4«p) 


y\ ii'[, H2, a\, p) 
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is equivalent to 

y\ fi'；, ii'{) - q(x y y; ^)] < ln/c. 

Moreover, it is clear that the difference in the left-hand member of this 
inequality does not contain terms involving x 1 , xy ， and y 2 . In particular, this 
inequality is the same as 

~ iA PUh ~ x + M2 — M2 P(M\ ~ 

(Ty OO2 (TJ O2 




<ln^ + ^[^(0, ^)-^(0, 0; fi'；, ^)] y (3) 

or, for brevity, 

ax + by < c. 

That is, if this linear function of x and jin the left-hand member of inequality 
(3) is less than or equal to a certain constant, we would classify that (jc, y) as 
coming from the bivariate normal distribution with means and 〆[ 
Otherwise, we would classify (x, 少 ） as arising from the bivariate normal 
distribution with means M and fi' 2 . Of course, if the prior probabilities and 
losses are given, k and thus c can be found easily; this will be illustrated in 
Exercise 9.43. 

Once the rule for classification is established, the statistician might 
be interested in the two probabilities of misclassifications using that 
rule. The first of these two is associated with the classification of {x, >>) 
as arising from the distribution indexed by Q" if, in fact, it comes from 
that index by 0’. The second misclassification is similar, but with the 
interchange of 8' and d". In the preceding Example, the probabilities 
of these respective misclassifications are 

Pr (aX + bY < c\ fi\, /i^) and Pr (aX + bY > c; fi", ^). 

Fortunately, the distribution oiZ = aX bY easy to determine, 
so each of these probabilities is easy to calculate. The m.g.f. of Z is 

E(e' z ) = E[e ,{aX + t,y) ] = E(e° lX + blY ). 


Hence in the joint m.g.f. of X and Y found in Section 3.5, simply replace 
by at and t 2 by bt to obtain 


E(e ,z ) = exp /i,a/ 4 - fi 2 bt + 


a]{atf 4- 2pa [ ff 2 (at)(bt) + o\jbt) 2 
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However, this is the m.g.f. of the normal distribution 


N(aHi + bfi 2 , + lab pa 2 + 护办 

I 

With this information, it is easy to compute the probabilities of 
misclassifications, and this will also be demonstrated in Exercise 9.43. 

One final remark must be made with respect to the use of the 
important classification rule established in Example 3. In most 
instances the parameter values fi\, ^ and ^ as well as of, and 
p are unknown. In such cases the statistician has usually observed a 
random sample (frequently called a training sample) from each of the 
two distributions. Let us say the samples have sizes n' and n", 
respectively, with sample characteristics 


x\7, (5；) 2 , (s;) 2 , 〆 and ?，7, (s：)\ Cs;) 2 , r". 

Accordingly, if in inequality (3) the parameters 〆，％， 〆■’， ^ 2 , 
and pff,<r 2 are replaced by the unbiased estimates 

- - - 《 K ) 2 + n\s：) 2 n\s；y + n"^；f 

. 文’少’文’少， W -2 ， 打 ' + 一 2 ’ 


«>'« + n"r"s'^Sy > 

~~ n f + n"-2 ， 

the resulting expression in the left-hand member is frequently called 
Fisher’s linear discriminant function. Since those parameters have been 
estimated, the distribution theory associated with aX + bY is not 
appropriate for Fisher’s function. However, if n r and n" are large, the 
distribution oi aX bY does provide an approximation. 

Although we have considered only bivariate distributions in this 
section, the results can easily be extended to multivariate normal 
distributions after a study of Sections 4.10, 10.8, and 10.9. 


EXERCISES 

9.41. Let X u X 2j ..., & be a random sample of size 20 from a distribution 
which is N(8 y 5). Let L(0) represent the joint p.d,f. of X u X 2 ,..., The 
problem is to test H o :0 = l against H } :0=O. Thus Q = {0 : 0 = 0, 1}. 

(a) Show that L(1)/L(0) ^ k is equivalent to x ^ c. 

(b) Find c so that the significance level is a = 0.05. Compute the power of 
this test if i/, is true. 

(c) If the loss function is such that i?(l, 1) = if(0,0) = 0 and 
JSf(l, 0) = if(0, 1) > 0， find the minimax test. Evaluate the power 
function of this test at the points 0 = 1 and 0 = 0. 
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(d) If, in addition, the prior probabilities of 0 = 1 and 0 = 0 are, 
respectively, /j(l) = 5 and /»(0) = 5, find the Bayes’ test. Evaluate the 
power function of this test at the points 9 = 1 and 0 = 0 . 

9.42. Let X it X 2 , … ，不 0 be a random sample of size 10 from a Poisson 

distribution with parameter 0. Let L( 8 ) be the joint p.d.f. o{X { , X 2 ,..., A" l0 . 

The problem is to test H 0 \ 6 — ^ against H } : 6 = l. 

10 

(a) Show that L(\)/L(l) < k is equivalent to ^ x, ^ c. 

' ' 1 

(b) In order to make a = 0.05, show that H 0 is rejected if 少 9 and, if 少 = 9, 

reject H 0 with probability j (using some auxiliary random experiment). 

(c) If the loss function is such that if ( 5 , = S^(l, 1) = 0 and ^( 5 , 1)=1 

and = 2 show that theminiraaxprocedure is to reject H 0 ify> 6 

and, if 少 = 6 , reject H 0 with probability 0.08 (using some auxiliary 
random experiment). 

(d) If, in addition, we are given that the prior probabilities of 0 = 5 and 
0=1 are =15 and A(l) = respectively, show that the Bayes’ 
solution is to reject H 0 i( y > 5.2, that is, reject H 0 if y> 6 . 


9.43. In Example 3 let /ij = ^ = /if = = 1, <r^ = 1, Oj = 1, and p =\- 

(a) Evaluate inequality (3) when the prior probabilities are h(ji\ ， / 4) = I 

and ^ 2 ) = 5 and the losses are S£[d = (n\, /ij), S = (^, — 4 

and £^[6 = , fi' 2 ), 3 ^ (jx\, fi^)] — 1 . 

(b) Find the distribution of the linear function aX + bY that results from 
part (a). 

(c) Compute Pr (aX + bY < c\ = n ' 2 = 0) and Pr (aX + bY > c; = 

甿 =1). ， 

~ '* . * : , 

9.44. Let X and Y have the joint p.d.f. 

£ 

e 2 

zero elsewhere, where 0 < 0 < An observation (x, 少 ） arises from the 

joint distribution with parameters equal to either { 6 \ = \, 62 = 5) or ( 6 '{ = 3, 
02 = 2). Determine the form of the classification rule. 

9.45. Let X and ^ have a joint bivariate normal distribution. An observation 
(x, y) arises from the joint distribution with parameters equal to either 

"i = W = 0, (ff\y = (<tIY = 1, p , = 5 

or 


0 < < 00, 0 〈少 < 00, 


f(x, y; 0,, 6 2 ) 


❹ 2 


exp 


x 

e x 


i^'\ = = 1 ， （ °i )〃 〒 4 ， (o\y — 9 , p" — j. 

Show that the classification rule involves a second degree polynomial in x 
and y. 
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9.46. Let X u , X„bt a. random sample from a distribution with one 

of the two probability density functions (l/p)J][{x - d)/p], —co<6<oo, 
p > 0, i = 1, 2. We wish to decide from which of these distributions the 
sample arose. We assign the respective prior probabilities p\ andp 2 to/i and 
/ 2 , where/?i + p 2 = 1. If the prior p.d.f. assigned to the nuisance parameters 
6 and p is g(6, p), the posterior probability of f, is proportional to 


pM\x u … ， x„), where 
* X n) = 




fi 


- 

r~p~ 


■f, 


0 、 

,p j 


g{B, p) dd dp. 


I = 1; 2* 

■r » 

If the losses associated with the two wrong decisions are equal, we would 
select the p.d.f. with the largest posterior probability. 

(a) If ^(0; p) is a vague noninfoimative prior proportional to 1 jp, show that 


咖 x,，••• ， = 




dddp 



*QO 

X" ~ 2 f f {kx x —u) •- — u)dudX 

— CD 


by phanging variables through 6 — /i/A, p = 1/A. Hdjek and Sidak show 
that using this last expression, the Bayesian procedure of selecting f 2 
over /, if 

provides a most powerful location and scale invariant test of one model 
against another. 

(b) Evaluate i = 1, 2, given in ⑻ for /,(x) = 

— l<x< 1, zero elsewhere, and f 2 (x) is the p.d.f. of N(0, 1). Show that 
the most powerful location and scale invariant test for selecting the 
normal distribution over the uniform is of the form (y„ — Y^/S < k, 
-y where K| < Y 2 < • ■ • < Y H are the order statistics and S is the sample 
standard deviation. 


ADDITIONAL EXERCISES 

9.47. Consider a random sample X t ,X 2 ,.. - ,X„ from a distribution with 

p.d.f. 6) = 0(1.— jc)® - 0 < x < 1, zero elsewhere, where 0 > 0. 

(a) Find the form of the uniformly most powerful test of // 0 : 0 = 1 against 

H\ •_ 0 > \， ? :- j 

(b) What is the likelihood ratio A for testing H 0 :d = \ against //,: 0 ^ 1? 

9.48. Let X\,X 2 . X n be a random sample from a distribution with p.d.f. 



444 


Theory of Statistical Tests [Ch. 9 


How would you classify W into I or II? 

9.54. Let ^ be Poisson 6. Find the sequential probability ratio test for 
testing H(f：8 = 0.05 against H t : 6 — 0.03. Write this in the form 


f(x; 6) = 0 < jc < 1 , zero elsewhere. 

(a) Find a complete sufficient statistic for 8. 

(b) If a = ^ = ^, find the sequential probability ratio test of H 0 :0 = 2 

against H t :0 = 3. -> … 

{ j ' I f 

9.49. Let X have a Poisson p.d.f. with parameter B. We shall use a random 
sample of size n to test H 0 :0 = l against H { : 6 ^ 1. 

(a) Find the likelihood ratio A for making this test. 

(b) Show that A can be expressed in terms of X, the mean of the sample, 
so that the test can be based upon X. 

9.50. Let X y , X 2 . X K and y,, Y 2 ,..., Y„ be independent random 

samples from two normal distributions a 2 ) and N(ji^ a 2 ), respectively, 

where a 2 is the common but unknown variance., 

(a) Find the likelihood ratio A for testing = 0 against all 

alternatives. 

(b) Rewrite A so that it is a function of a statistic Z which has a well-known 
distribution. 

(c) Give the distribution of Z under both null and alternative hypotheses. 

9.51. Let X t ,... ,X„ denote a random sample from a gamma-type 
distribution with alpha equal to 2 and beta equal to d. Let i/ 0 : 0 = 1 and 

(a) Shpw that there exists a uniformly most powerful test for H 0 against 
H t , determine the statistic Y upon which the test may be based, and 
indicate the nature of the best critical region. 

(b) Find the p.d.f. of the statistic Y in part (a). If we want a significance 
level of 0.05, write an equation which can be used to determine the 
critical region. Let K(d), 9 ^ 1 , be the power function of the test. 
Express the power function as an integral. 

9.52. Let (^i, y,), (X 2 , Y 2 ), .. ., (X„, Y n ) be a random sample from a 
bivariate normal distribution with ti ii n 1 ,<r\ = a\ — <T 1 ,p = 5 , where fi it /x 2 > 
and ir 2 > 0 are unknown real numbers. Find the likelihood ratio 又 for testing 
H 9 : /i, — H 2 =f 0, a 2 unknown against all alternatives. The likelihood ratio 
A is a function of what statistic that has a well-known distribution? 

9.53. Let W' = (JF,, W 2 )bean observation from one of two bivariate normal 
distributions, I and II, each with /x, = // 2 = 0 but with the respective 
variance-covariance matrices 


C3 

11 

nd 

a 


\m/ 

0 4 
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n 

Co(«) < Yi 不 ⑻， determining c 0 («) and c,(«) when a a = 0.10 and 

1 = 1 

P a = 0.05. 


9^5. Let X and Y have the joint p.d.f. 


Ax, y ； 9 i ,9 2 ) = —exp 

Ol0 2 


X y\ 
0 , ej 5 


0 < x < oo, 0 < >» < qo. 


zero elsewhere, where 0 < 0,, 0 < 0 2 - An observation (x, y) arises from the 
joint distribution with 9\ = 10, = 5 or = 3, 0j = 2. Determine the 

form of the classification rule. 
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10.1 The Distributions of Certain Quadratic Forms 

A homogeneous polynomial of degree 2 in « variables is called 
a quadratic form in those variables. If both the variables and 
the coefficients are real, the form is called a real quadratic form. 
Only real quadratic forms will be considered in this book. To 
illustrate, the form X] + X t X 2 + is a quadratic form in the two 
variables X { and X 2 \ the form X 2 f + Xl + X\ — 2X t X 2 is a quadratic 
form in the three variables X t , X 2 , and but the form 
(A", — l) 2 + (X 2 — 2) 2 = X + M — 2^ — 4 尤 2 + 5 is not a quadratic 
form in and X 2 , although it is a quadratic form in the variables 

一 1 End Xi — 2. 

Let X and S 2 denote, respectively, the mean and the variance of a 
random sample X t , X 2 ,..., X n from an arbitrary distribution. Thus 

”穿 =[« — A ") 2 = 

i 


zk- 


w … + ( 


n 
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= ^zl(X 2 i + X 2 2 + -^ + X 2 n ) 

+ . • • + + _ . • + X„^X„) 

is a quadratic form in the n variables X^,X 2 ,..., X n . If the sample 
arises from a distribution that is N(ji ， a 2 ), we know that the 
random variable nS 2 /a 2 is x 2 (« — 1) regardless of the value of /i. This 
fact proved useful in our search for a confidence interval for a 2 when 
H is unknown. 

It has been seen that tests of certain statistical hypotheses require 
a statistic that is a quadratic form. For instance, Example 2, Section 

■-? i n 

9.2, made use of the statistic Y, which is a quadratic form in the 

I 

variables^,, X 2 ,\ .., X„. Later in this chapter, tests of other statistical 
hypotheses will be investigated, and it will be seen that functions of 
statistics that are quadratic forms v?ill be needed to caiTy out the tests 
ift an expeditious manner. But first we shall make a study of the 
distribution of certain quadratic forms in normal and independent 
random variables. 

The following theorem will be proved in Section 10.9. 

Theorem 1. Let Q = Q x + Q 2 + + Q k -t + Q k , where Q, Q,, 

...,Qk are k + 1 random variables that are real quadratic forms in n 
independent random variables which are normally distributed with the 
means /i,, /z 2 ,. . ., and the same variance a 2 . Let Q/c 2 , 

Q\l。 2 ,. • ., Qk_\l。 2 have chi-square distributions with degrees offree¬ 
dom r, r,,.. ., r*_ I, respectively. Let Q k be nonnegative. Then: 

(a) Q\,... ,Qk are independent, and hence 

(b) Qkjo 1 has a chi-square distribution with r — + ■•• + r k _ = r k 

degrees of freedom. 

Three examples illustrative of the theorem will follow. Each of 
these examples will deal with a distribution problem that is based 
on the remarks made in the subsequent paragraph. 

Let the random variable X have a distribution that is N(ji, a 2 ). 
Let a and b denote positive integers greater than 1 and let 
n = ab. Consider a random sample of size n — ab from this normal 
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distribution. The observations of the random sample will be denoted 
by the symbols 


^Il» 

尤 _2 ， 


• • • ， X\b 




• ■ • ， 又 Ih 


x i2 . 

• • •» 

• ; 、， X ib 

X au 


x 

• • • » aj9 

* - * » ^ ab 


In this notation, the first subscript indicates the row, and the 
second subscript indicates the column in which the observation 
appears. Thus X tj is in row i and column 7 , i = 1,2,..., a and 
j = 1,2,...,/). By assumption these «.= ab random variables are 
independent, and each has the same normal distribution with 
mean 从 and variance a 2 . Thus, if,we wish, we may consider each row 
as being a random sample of size 办 from the given distribution; and we 
may consider each column as being a random sample of sjze a from 
the given distribution. We now define a + b + \ statistics. They are 




I" + _ . _ + X'ff + . . • + Xg\ + . . _ + Xglf 

ab 


a 


b 


I I 

/ ■I1 


Xu 


ab 


X ,= 


In + Xn + • • • + Xjf, 


J=i 

~b ~ ， 


and 


/ = 1 ，2 ， . •. ， a ， 


X 


x'j + + • ■ ■ + X aJ 


Z A 




a 


a 


j = l ， 2, … ， b. 


The statistic AT., is the mean of the random sample of size rt = ab; the 
statistics AT,., X 2 .,_ L ., are,, respectively, the means of the rows; 
and the statistics X ml , •.. ， X are, respectively, the means of the 
columns. Three examples illustrative of the theorem will follow. 



Example 1. Consider the variance S 2 of the random sample of size n = ab. 
We have the algebraic identify 

abS 2 =t t i^u - 


=I z 队 - 兄 .)+A- 足 .)] 2 

=t i^j-xo 1 + i i (x, - K.y 

i=\ j=) /«Iy-I 

+ 2 t i - X.KX. - XJ. 

i = [ j as l 

The last term of the right-hand member of this identity may be written 

2 i (兄■-兄 .)I % - 兄 .)1 = 2 f [(X, - X.XbX, - bX,)] = 0, 


and the term 


may be written 


Thus 


I I - xy 


b x (x,, - xy. 


0*5" = x x (x, - x,y + * j; (x h - xy, 

/ b I y = I / = I 

or, for brevity, 

Q= Q\ + Qi- 

Clearly, Q, 0,, and Q 2 are quadratic forms in the n = ab variables X i} . We shall 
use the theorem with k = 2to show that and Q 2 are independent. Since S 2 
is the variance of a random sample of size n = ab from the given normal 
distribution, then abS^/a 2 has a chi-square distribution with ab — 1 degrees of 
freedom. Now 


v mmm 

For each fixed value of /, ^ (X tJ — X im ) 2 jb is the variance of a random 
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sample of size b from the given normal distribution, and, accordingly, 

b _ 

Y, — has a chi-square distribution with b — 1 degrees of freedom. 

7 = I 

Because the Xy are independent, Q^ia 1 is the sum of a independent random 
variables, each having a chi-square distribution with b — 1 degrees of freedom. 
Hence Q\!a 2 has a chi-square distribution with a(b — 1) degrees of freedom. 

« _ - 

Now Qi = b Y, — U 乏 0. In accordance with the theorem, Q { and 

i= I 

Q 2 are independent, and Q 2 /<t 2 has a chi-square distribution with 
ab — \ ~ a(b — 1) = a — 1 degrees of freedom. 

Example 2. In abS 1 replace A", y — A"., by — Xj) + (Xj — (•) to obtain 

X [(X 0 - X.j) + (X.j - XJ]\ 

j I / « 1 


ab^=Y. I - ^j ) 2 + a I (Xj - XJ\ 

j= \ i=\ I 


or, for brevity, 


0 = 03 + 04 


It is easy to show (Exercise 10.1) that Q^jo 2 has a chi-square distribution with 

b _ _ 

b(a — 1) degrees of freedom. Since g 4 = a ^ (X j — A^..) 2 ^ 0, the theorem 

y-1 

enables us to assert that and Q 4 are independent and that Q 4 /cr has a 
chi-square distribution with ab — 1 — b(a — l) = b — l degrees of freedom. 

Example 3. In ■ abS 1 replace W, by (X^ - XJ + (Xj-XJ + 
(Xij — X L — Xj + to obtain (Exercise 10.2) 

ab^^bt (X, - x,y + fl i (U .) 2 

1 = I j = 1 


+ ii 

7 = 1 I = I 

or, for brevity, 

Q — Qi^~ Qa + .05, 

where Q 2 and Q A are as defined in Examples 1 and 2. From Examples 1 and 
2, Q/o 2 , Q 2 I 0 2 , and Q^/a 2 have chi-square distributions with ab — l,a — 1， and 
b — 1 degrees of freedom, respectively. Since Q 5 > 0, the theorem asserts that 
Q 2 , Q 4 , and Q s are independent and that 0 5 /a 2 has a chi-square 
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distribution with ab — 1 — (a — 1) — (ft — 1) = (o — 1)(6 — 1) degrees of 
freedom. 

Once these quadratic form statistics have been shown to be independent, 
a multiplicity of F-statistics can be defined. For instance, 

QJ[Ab - l)] _ g 4 /(fe - 1) 

Q,\Wb{a - 1)] ' Q.ma - 1)] 

has an /'-distribution with b — 1 and b{a — 1) degrees of freedom; and 

QJ[o\b-\)\ _ QJ{b - 1) 

Q 5 /[o\a-\)(b-\)]~ Q 5 l(a - l)(b - 1) 

has an F-distribution with b — 1 and. (a — 1)(6 — 1) degrees of freedom. In 
the subsequent sections it will be seen that some likelihood ratio tests of certain 
statistical hypotheses can be based on these F-statistics. 

EXERCISES 

10.1. In Example 2 verify that Q = + Q 4 and that Q 3 /cr 2 has a chi-square 

distribution with b(a — 1) degrees of freedom. 

10.2. In Example 3 verify that Q = Q 2 + Qa+ 05- 

10.3. Let X u A" 2 ,..., A" B be a random sample from a normal distribution 
N(ji, a 2 ). Show that 

I -x) 2 = t - P) 2 + ^ (jr, - r> 2 , ^ 

i -1 n 

.where X= f X t /n and X f = f - 0- 

1 i = 2 

Hint: Replace X, — X by — X f ) — (X[ — X')jn. Show that 
» — 

[ (X, — X') 2 ja 2 has a chi-square distribution with n — 2 degrees of 

f = 2 

freedom. Prove that the two terms in the right-hand member are 
independent. What then is the distribution of 

[{n~\)ln]{X x -X r )\ 

> <7 2 

10.4. Let X iJk , i = 1,..., a; j— \,..., b.、k = 1,..., c, be a random sample 
of size n — abc from a normal distribution Niji, a 2 ). Let A"...= 

I I t X ijk \n and X L . = f f X ijk /bc. Show that 

k ^\/—I k 农 、 j ™\ 

:， .■ * * 

X X n — 尤 …) 2 = E E S ( 足/々一 o 丰 & s • — 尤 …) 2 . 

I = 1 / = 1 if « I i = I 
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a b C 一 

Show that E X Z (X iJk — X^y/a 2 has a chi-square distribution with 

i* = I j * I ir * l 

a{bc 〜 ’:1) degrees of freedom. Prove that the two terms in the right-hand 
member are independent. What, then, is the distribution of 

be X A. - 兄 ..) 2 /〆? Furthermore, let 无丄 = 免 Z ^/ac and X i； ,= 

i = I At = I f = I 

c 

^ X iJk /c- Show that 

k= I 

i 11 (^ - u 

r — l J = l A: ■■ 1 . 

= Z Z Z — 

f SB \ j WE \ 1( SB \ 

+ bci (x K , - xy + aci (x h - x,y 

r«1 ；=I 

+ c I I (A. - 无 d + 足 ..) 2 - 

Show that the four terms in the right-hand member, when divided by a 2 , 
are independent chi-square variables with ab{c— 1), a — 1, b — \, and 
(a — 1)(A — 1) degrees of freedom, respectively. 

10.5. Let X x , X 2 , X 3 , AT 4 be a random sample of size « = 4 from the normal 

distribution A^O, 1). Show that ^ (X — ^) 2 equals 

<=l 

(X t - X 2 f [X 3 - (X, + X 2 )I2] 2 [X a - (X, + + X 3 m 2 

~~2 3/2 4/3 

and argue that these three terms are independent, each with a chi-square 
distribution with 1 degree of freedom. 

10.2 A Test of the Equality of Several Means 

Consider b independent random variables that have normal 
distributions with unknown means 川，只 2 ,… ，叫， respectively, and 
unknown but common variance a 2 . Let X u , X 2j , … ， X aj represent a 
random sample of size a from the normal distribution with mean ^ 
and variance a 2 , j = \,2,..., b. It is desired to test the composite 
hypothesis // 0 : 川 = 爿 2 = … = 叫 = #， # unspecified, against all 
possible alternative hypotheses H x . A likelihood ratio test will be used. 
Here the total parameter space is 

Q = {(#, ， A，• • • ，糾， tr 2 ): —co<Hj<<x>, 0 < a 2 < oo} 
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and 

⑴ ={(^ 1 ， # 2 , •. ， 汍， - oo < / i , = / i 2 = • - * 

== /i < oo, 0 < a 2 < oo}. 
The likelihood functions, denoted by L{oS) and L(Q) are, respectively, 

and 

Now 


and 


d In L{oS) 

% 


i i^-n) 


a In L{oS) ab M 1 4, ^ , 、 2 

~^r = —P+A?,,_?,(〜_#)• 


If we equate these partial derivatives to zero, the solutions for ^ and 
a 1 are, respectively, in co, 

b a 

Z 

7=1 f * I — 

； === 


i i^.-xy 

j =I i= I 


and these values maximize L(<o). Furthermore, 


and 


5 In L(Q) 


Z ( x u - 

i ■= I 








( 1 ) 



If we equate these partial derivatives to zero, the solutions for 
从，只 2 , ... ， 叫 ， and a 2 are, respectively, in O, 




a 


x 


•j ， 


1 ，2， • • • ， b ， 


b a 


I I (Xij-XjY 


( 2 ) 


ab 


w\ 


and these values maximize L(ft). These maxima are, respectively. 


Lm 



b a 


；=l /=i 

.2 x z 


ab\l 


2 tc Z £ (Xij-x,y 


and 
L(Q) 

Finally, 


ab 


2 兀 S 1 ( x ij~^.j) 2 


ab/2 


e 


-ab/2 


A: 


m) 

硕 




7 «丨 »-l 


ab\l 


In the notation of Section 10.1, the statistics defined by the 
functions x mm and v given by Equations (1) of this section are 

and 

I /ssa I QD ja I 

while the statistics defined by the functions jc.,, x m2 , ■ ■ ■, x, b and w 

— a 

given by Equations (2) in this section are, respectively, X.j= X^fa, 
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j = \,2,... ,b y and Q 3 {ab = E Z {^；j - X.jYlab. Thus，in the 

y*i i — I 

notation of Section 10.1 ， 又 2/ab defines the statistic Q 3 /Q. 

We reject the hypothesis H 0 if X< Xq. To find 々 so that we have 
a desired significance level a, we must assume that the hypothesis H 0 
is true. If the hypothesis H 0 is true, the random variables constitute 
a random sample of size n = ab from a distribution that is normal with 
mean fi and variance a 2 . This being the case, it was shown in Example 

b _ _ 

2, Section 10.1, that Q = Qj + Q 4 , where Q 4 = a (Xj - P.) 2 ; that 

y = 1 

Q } and Q 4 are independent; and that Qi/o 2 and Q^o 2 have chi-square 
distributions with b(a — 1) and b — 1 degrees of freedom, respectively. 
Thus the statistic defined by X 2/ab may be written 


Q 3 1 

03 + Qa 1 + 04/03 


The significance level of the test of H 0 is 


a = Pr 


1 + 04/(23 


< ^ ； H, 


Pr 


QJib - 1) 
lQ,/[b(a - 1)] 


> c; H 0 


where 


But 


c = 



(V /fl6 - 



QJ[Ab - 1 )] _ QJ(b - 1 ) 

~ Q,/[c 2 b(a - 1 )] _ Q.ma - 1 )] 

has an F-distribution with b — l and b{a — 1) degrees of free¬ 
dom. Hence the test of the composite hypothesis H 0 : ^ = 
ju 2 = … = 叫 =unspecified, against all possible alternatives may 
be based on an F-statistic. The constant c is so selected as to 
yield the desired value of a. 


Remark, It should be pointed out that a test of the equality of the 厶 means 
Hj,j = 1, 2,..., does not require that we take a random sample of size a 
from each of the b normal distributions. That is, the samples may be of 
different sizes, say a u a 2t , a b . A consideration of this procedure is left to 
Exercise 10.6. 




456 


InferoKes About Nornud Models [Ch. 10 


Suppose now that we wish to compute the power of the test of H Q 
against H } when H 0 is false, that is, when we do not have 
fh = fh = • • • = fb = U will be seen in Section 10.3 that when H x is 
true, no longer is QJa 2 a random variable that is x\b — 1). Thus we 
cannot use an F-statistic to compute the power of the test when H\ is 
true. This problem is discussed in Section 10.3. 

An observation should be made in connection with maximizing a 
likelihood function with respect to certain parameters. Sometimes it is 
easier to avoid the use of the calculus. For example, L(J2) of this section 
can be maximized with respect to 衿， for every fixed positive ct 2 , by 
minimizing 

:=z i ( 〜厂约 ) 2 
j ™ i / = I 

with respect to fipj = \, 2,... t b. Now z can be written as 
z = E Z [(% — D + d — 衿 )] 2 

y=I/= I 

=Z Z ( x u - ^.;) 2 + a E ( 又 . 厂内 )' 

j — \ / *= I y = ij 

Since each term in the right-hand member of the preceding equation 
is nonnegative, clearly z is a minimum, with respect to 巧， if we take 
/iy = Xj, j = 1 ，2, . ■ . ， /?• 

EXERCISES 

10.6>Let X,j ,X 2j ,..., X ajJ represent independent random samples of 
sizes aj from normal distributions with means ftj and variances a 2 , 
j = 1 ， 2, ， ..，6 . Show that 

i i^ u -xy^i f (Xy - XjY + i aj {x.j - xy t 

j ~ \ / = I y * i / = l ) = i 

or Q' = Q'^ Ql Here 足_ = t f； X 0 /t a y and Xj = f XJaj. If 

/ * I I « 1 j =■ I / 面 I 

料 i = 卩 2 = = fh, show that Q'jo 2 and have chi-square 

distributions. Prove that 03 and Q\ are independent, and hence Q^fc 2 also 
has a chi-square distribution. If the likelihood ratio X is used to test 
= fi 2 = • ■ ■ = fif, — fi, n unspecified and o 2 unknown, against all 
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possible alternatives, show that A s Aq is equivalent to the computed F> c, 
where 



(b - 1)03 


What is the distribution of F when H 0 is true? 


10.7. Consider the r-statistic that was derived through a likelihood ratio 
for testing the equality of the means of two normal distributions 
having common variance in Example 2 in Section 9.3. Show that T® is 
exactly the F-statistic of Exercise 10.6 with a t = n, a 2 = m, and b = 2. 
Of course, X,,. .. y X Hi X are 一 replaced with {■，...， X lK , X, and 

r„...,y„,rbyx 2 ■，… H 

10.8. In Exercise 10.6, show that the linear functions X 0 — X j and X } — X % 

are uncorrelated. _ _ 

Hint: Recall the definitions of Xj and and, without loss of generality, 

we can let E(X tJ ) = 0 for all ij. 

10.9. The following are observations associated with independent random 
samples from three normal distributions having equal variances and 
respective means /i t , fij. 


I 

II 

III 

0.5 

2.1 

3.0 

1.3 

3.3 

5.1 

-1.0 

0.0 

1.9 

1.8 

2.3 

2.4 


2.5 

4.2 



4.1 


Compute the F-statistic that is used to test H 0 : /i, = fh = fh- 

10.10. Using the notation of this section, assume that the means satisfy the 
condition that n = 叫 + (b — l)d = fi 2 — d = 叫 一 d = • ■ ， = 叫 一 d. That 
is, the last b — 1 means are equal but differ from the first mean 川， provided 
that rfft o. Let independent random samples of size a be taken from the b 
normal distributions with common unknown variance a 2 . _ 

(a) Show that the maximum likelihood estimators of /i and dare jl = 
and 


E XAb - l) - x A 
- 


b 
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(b) Using Exercise 10.3, find Q 6 and Q n = c3} so that, when </ = 0, Qyja 2 
is x 2 (l) and 

t i - +q 6 + Qr. 

i *= I 』 =i 

(c) Argue that the three terms in the right-hand member of part (b), once 
divided by a 2 , are independent random variables with chi-square 
distributions, provided that d = 0. 

(d) The ratio 0 7 /(0 3 + Q 6 ) times what constant has an F-distribution, 
provided that d = 0? Note that this F is really the square of the 
two-sample T used to test the equality of the mean of the first 
distribution and the common mean of the other distributions, in which 
the last b ~ l samples are combined into one. 

10.3 Noncentral x 2 and Noncentral F 

Let X l ,X 2i ... ,X„ denote independent random variables that are 

n 

N(ji„ a 2 ), i= 1,2,... ,n, and let X^/a 2 . If each 从 is zero, we 

I 

know that Y is x\ n ) - We shall now investigate the distribution of Y 

when each n,- is not zero. The m.g.f. of Y is given by 



The integral exists if / < i. To evaluate the integral, note that 

{x t - Mi) 2 = xfO - 2t) ' 2 叫 Xi _ 左 
a 2 2a 1 2a 2 2c 2 2a 2 

__ \ —2t f ^ Y 

= - 2t) ~ 
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Accordingly, with 1 <l 2 ,we have 

£ [ exp 倒， 


Gy/lil 


x exp 


If we multiply the integrand by y/l — 2/, t <1 ， we have the integral 
of a normal p.d.f. with mean /z,/(l - 2/) and variance a 2 /(\ — 2t). Thus 

4 ex KS)] = 7^ exp [^/ 


and the m.g.f. of Y = ^ Xj/o 2 is given by 


M ⑺ 


, 心 2 

1 I 

(1 _ 2/) B/2 CXP (^(1 - 2t ) ， 


A random variable that has an m.g.f. of the functional form 

where t <\, 0 < 0 ， and r is a positive integer, is said to have a 
noncentra/ chi-square distribution with r degrees of freedom and 
noncentrality parameter 0. If one sets the noncentrality parameter 
6 = 0, one has M(t) = (1 — 2t)~ r/i , which is the m.g.f. of a random 
variable that is ^ 2 (r). Such a random variable can appropriately be 
called a central chi-square variable. We shall use the symbol ^ 2 (r, B) to 
denote a noncentral chi-square distribution that has the parameters r 
and 0; and we shall say that a random variable is x\ r ^ when that 
random variable has this kind of distribution. The symbol x\ r ^ 0) * s 

ff 

equivalent to x\ r )- Thus our random variable K = Af/(T 2 of this 
section is Z 2 («，f W/a 2 ). If «ach /i, is equal to zero, then Kis x 2 («» 0) 
or，more simply, Kis x\ n )- 

The noncentral chi-square variables in which we have interest are 
certain quadratic forms, in normally distributed variables, divided by 
a variance a 2 . In our example it is worth noting that the noncentrality 
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n n 

parameter of ^ A^/tr 2 , which is ^/i-/flr 2 , may be computed by 

I I 

replacing each X, in the quadratic form by its mean / = 1, 2,..., n. 
This is no fortuitous circumstance; any quadratic form Q = 
Q(Xi ,..., X„) in normally distributed variables, which is such that 
Q/a 2 is 0), has 0 = Q(ji x ， / z 2 ,... ， and if Q/g 2 is a chi-square 
variable (central or noncentral) for certain real values . •., 

it is chi-square (central or noncentral) for all real values of these means. 

It should be pointed out that Theorem 1, Section 10.1, is valid 
whether the random variables are central or noncentral chi-square 
variables. 

We next discuss a noncentral F-variable. If U and V are in¬ 
dependent and are, respectively, xVi) and Jt 2 ( r 2 )，the random variable 
Fhas been defined by F= r 2 C//r, V. Now suppose, in particular, that 
C/is x 2 (^i, 0), 厂 is x 2 ( r 2 )，and that U and V are independent. The random 
variable r 2 U/r { V is called a noncentral F-variable with r, and r 2 degrees 
of freedom and with noncentrality parameter 6. Note that the 
noncentrality parameter of F is precisely the noncentrality parameter 
of the random variable U, which is x 2 ( r i， 设). 

Tables of noncentral chi-square and noncentral F are available in 
the literature. However, like those of noncentral t, they are too bulky 
to be put in this book. 

EXERCISES 

10.11. Let Y h i= 1, 2,... ,n, denote independent random variables that 

ft 

are, respectively, ^/)* * = 1,2,... ,n. Prove that Z = F, is 

，(零 r 44 


10.12. Compute the mean and the variance of a random variable that is 

X\r, 6). 

10.13. Compute the mean of a random variable that has a noncentral 
/"■distribution with degrees of freedom r, and r 2 > 2 and noncentrality 
parameter 6. 


10.14. Show that the square of a noncentral r random variable is a non¬ 
central F random variable. 
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10.15. Let and X 2 be two independent random variables. Let X y and 

y = A^i + be jfVi，0|) and x 2 (r, 6), respectively. Here r, < r and 0, <, 9. 

Show that X 2 is ^{r — r,, 0 — 0,). 


10.16. In Exercise 10.6，if fi 2 ,..., ^i b are not equal, what are the 
distributions of Q^/a 2 , Q^fa 2 , and F? 


10.4 Multiple Comparisons 

Consider b independent random variables that have normal 
distributions with unknown means /i,, ^ 2 ,..., /i b , respectively, and 
with unknown but common variance a 2 . Let k t ,k 2 ,... ,k b represent 
b known real constants that are not all zero. We want to find a 

b 

confidence interval for kjHj, a linear function of the means 

I 

, fi b . To do this, we take a random sample X^, X 2J , •.. ， X aj 
of size a from the distribution tr 2 ), j = 1 ， 2, ... ，厶 .If we denote 

— i — 

J] Xy/a by X mJ , then we know that Xj is N(ti j9 ^/a), that 

/= I 

o — 

Y, (,^u — is z 2 (a — 1)，and that the two random variables are 

l=l 

independent. Since the independent random samples are taken from 
the b distributions, the 2b random variables ^ (A^ 7 — 

—- i = I 一 

j = 2,b, are independent. Moreover, A".,, X m2 , • • ■, X. b and 



(H ) 2 

^ o 1 ^ 


are independent and the latter is — 1)]. Let Z = j ^ Xj. Then 

b fh \ * 

Z is normal with mean J] kj fij and variance \ b 2 1 a, and Z is 

independent of 


V 


b(a - 1 ) 
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Hence the random variable 


H k U k 此 




T 




J(ik])v/a 

has a /-distribution with b{a — 1) degrees of freedom. A positive 
number c can be found in Table IV in Appendix B, for certain values 
of a, 0 < a < 1, such that Pr (—c < T <, c) = \ — a. It follows that the 
probability is 1 — a that 

m- c ' l(i k j)^ ^ z k ^j ^ z k j^.j + 


純 . 


The experimental values of X tj , j = \, 2,b, and V will provide a 

— ' b 

100(1 — a) percent confidence interval for kjUj. 

i 

b 

It should be observed that the confidence interval for Y, 

I 

depends upon the particular choice of k 2 ,..., k b . It is conceivable 
that we may be interested in more than one linear function of 
such as /i 2 - fi u Hj - {n t + 叱 )/2, or 灼 + . ■ • + 叫. We 

b 

can, of course, find for each a random interval that has a 

I b 

preassigned probability of including that particular [ k^j. But how 

' i 

can we compute the probability that simultaneously these random 
intervals include their respective linear functions of^,, /z 2 ,..., The 
following procedure of multiple comparisons, due to Scheffe, is one 
solution to this problem. 

The random variable 

Z - ^jf 


t^/a 

is ^{b) and, because it is a function of A".,,..., X. b alone, it is 
independent of the random variable 


V 


b{a — 1) 
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Hence the random variable 


a t (^-J - ^j) 2 ! b 


F 


V 


has an F-distribution with b and b(a — 1) degrees of freedom. From 
Table V in Appendix B, for certain values of a, we can find a constant 
d such that Pt(F< cf) = 1 — a or 


Pr 




1 — a. 


Note that ^ (Xj — /z 7 ) 2 is the square of the distance, in Z>-dimen- 

7= I 

sional space, from the point (ji t ， 卩 2 ,… ,pi b ) to the random point 
, X^ b ). Consider a space of dimension b and let 
(f,, t 2 ,..t b ) denote the coordinates of a point in that space. 
An equation of a hyperplane that passes through the point 

given by 


~ /^i) + k 2 (t 2 —只 2) + ... + k h (t h — fi h ) = 0, 


( 1 ) 


where not all the real numbers /c y , j = 1, 2 ,..., ft, are equal to zero. 
The square of the distance from this hyperplane to the point 
(^i = ^.\i h = > = U is 


[灸1«1 —芦 I ) + 灸2(尤.2 — #2) + . . . + k b (X mb — fl b )] 2 


( 2 ) 


^ + ^ 2 + - ■■ + k 2 b b 

From the geometry of the situation it follows that [ {X . 厂 ixj) 2 is equal 

1 

to the maximum of expression (2) with respect to k 2 ,..., k h . Thus 

b _ 

the inequality E (Xj — ^) 2 < (bd)(V/a) holds if and only if 


/ => . 

b 

Ik] 


<, 


bdt 

a 


(3) 


for every real k t ,k 2 ,, k b , not all zero. Accordingly, these two 
equivalent events have the same probability, 1 — a. However, 
inequality (3) may be written in the form 

I k j^.j - S k j^j 
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Thus the probability is 1 — a that simultaneously, for all real 
ki,k 7 ,..., k b , not all zero. 


Z k J^.j ~ 







V 


Denote by A the event where inequality (4) is true for all real 
k lf ... y k b , and denote by B the event where that inequality is true for 
a finite number of 方 -tuples (k、， …， k b ). If the event A occurs, certainly 
the event i? occurs. Hence P(A) < P(B). In the applications, one is often 

b 

interested only in a finite numter of linear functions kjpij. Once 

the experimental values are available, we obtain from (4) a confidence 
interval for each of these linear functions. Since P(B) > P{A) = 1 — a, 
we have a confidence coefficient of at least 100(1 — a) percent that the 
linear functions are in these respective confidence intervals. 

Remarks. If the sample sizes, say a,, a 2 ,..., a b , are unequal, inequality 
(4) becomes 

S ~ V ^ Z ^ S ^j^.j + » (4') 

I V i a ； I I V i Oj 

where 


x = in _ v=i _ 

i> 广 ” 


h 

and d is selected from Table V with b and [(% — 1) degrees of freedom. 

l 

Inequality (4’）reduces to inequality (4) when a t = a 2 = ■ •' = a b . 

b 

Moreover, if we restrict our attention to linear functions of the form Y, 
b 1 

with X = 0 (such linear functions are called contrasts), the radical in 

i 

inequality (4’）is replaced by 



b 

where rfis now found in Table V with b — 1 and [ (o ； — 1 > degrees of freedom. 

i 

In these multiple comparisons, one often finds that the length of a 
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confidence interval is much greater than the length of a 100(1 — a) percent 

* b 

confidence interval for a particular linear function ^ k } Hj. But this is to be 

I 

expected because in one case the probability 1 — a applies to just one event, 
and in the other it applies to the simultaneous occurrence of many events. 
One reasonable way to reduce the length of these intervals is to take a larger 
value of a, say 0.25, instead of 0.05. After all, it is still a very strong statement 
to say that the probability is 0.75 that all these events occur. 


EXERCISES 

10.17. If A u A 2 ,... ,A k are events, prove, by induction, Boole’s inequality 

k 

uA 2 <j--- uA k ) ^ Y, 尸 ( 為 )- Then show that 

I 

P(At n Af n ■■■ n AT) >\ - f 尸 ⑷. 

I 

10.18. In the notation of this section, let (k n ， k n ，.. •, k ib ), i — 1, 2,..., m, 
represent a finite number of ^-tuples. The problem is to find simultaneous 

b 

confidence intervals for [ k u Hj, / = 1, 2,..., w, by a method different 
from that of Scheffe. Define the random variable T ( by 

^ L ~ I k^jv/a ， / = 1 ， 2,… ， m. 

(a) Let the event Af be given by — c, < T) < c„ / = 1, 2,..., w. Find the 

b 

random variables U, and such that C/,. < [ <. is equivalent 

to AT. 

(b) Select c, such that P(A*) = 1 — a/m; that is, P(A,) = a/m. Use the 
results of Exercise 10.17 to determine a lower bound on the probability 
that simultaneously the random intervals (U t , fV y ) t ..., (U m , W m ) 

b b 

include ^ . •. ， g k mj^ respectively. 

(c) Let a = 3, b = 6, and a = 0.05. Consider the linear functions //, — /i 2 , 

M 2 - Hi, Hi - H4-(M5 + ^)/2, and (川 + /i 2 + • • • + /0/6. Here 

m = 5. Show that the lengths of the confidence intervals given by the 
results of part (b) are shorter than the corresponding ones given by 
the method of Scheffe, as described in the text. If m becomes 
sufficiently large, however, this is not the case. 
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10.5 The Analysis of Variance 

The problem considered in Section 10.2 is an example of a method 
of statistical inference called the analysis of variance. This method 
derives its name from the fact that the quadratic form abS 2 , which is 
a total sum of squares, is resolved into several component parts. In this 
section other problems in the analysis of variance will be investigated. 

Let i = 1 ， 2, ... ， a and j = 1 ， 2,. •. ， A, denote n = ab random 
variables that are independent and have normal distributions with 
common variance a 2 . The means of these normal distributions are 

a b 

Hu = + a, + pj, where ^ a ； = 0 and 乙岛 = 0. For example, take 

I I 

a — 2 y b = 3, /i = 5, 0 C| = 1 ， a 2 = —1, = 1, = 0, and = — 1. 

Then the ab = six random variables have means 

#ii = 7 ， 芦 l2 = 6 ， Ha — 5, 

"21 = 5 ， 芦 22 = 4 ， /i 2 3 = 3. 

Had we taken ^, = = 0, the six random variables would have 

had means 

Mil = 6 ， fj .12 — 6, " 13 = 6 ， 

= 4 ， n 22 = 4, H 23 = 4. 

Thus, if we wish to test the composite hypothesis that 


芦 11 = 

=^12 = * ' 


"21 = 

="22 = • 

.• = fhb ， 

\^a\ = 

= 队 2 =. 

' • = /U ， 


we could say that we are testing the composite hypothesis that 
P' = p 2 = … =Pb (and hence each 沁 = 0, since their sum is zero). On 
the other hand, the composite hypothesis 

Mil = M2I = - ' * = ^1 ， 

川 2 = fh2 = • . . = H a 2, 

1Mb = thb = … = /U ， 

is the same as the composite hypothesis that ot| = ot 2 = » a a = 0. 
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R^mariis. The model just described, and others similar to it, are widely 
used in statistical applications. Consider a situation in which it is desirable to 
investigate the effects of two factors' that influence an outcome. Thus the 
variety of a grain and the type of fertilizer used influence the yield; or the 
teacher and the size of a class may influence the score on a standard test. Let 
Xjj denote the yield from the use of variety i of a grain and type j of fertilizer. 
A test of the hypothesis that )5, = = • ■ ■ = = 0 would then be a test of 

the hypothesis that the mean yield of each variety of grain is the same 
regardless of the type of fertilizer used. 

a b 

There is no loss of generality in assuming that ^ a, = J] = 0. To see this, 

— -• 1 

let n,j = pi'J- a.] + Pj. Write a" =_Z a-/fl and = 2 0jjb. We have = 
+ oc r + + (pCj — flt , ) + {fi'j — = fi cti Pj, where £ oc f = Z 氏 = 0. 

To construct a test of the composite hypothesis H 0 : — 

jS 2 = • • • = = 0 against all alternative hypotheses, we could obtain 

the corresponding likelihood ratio. However, to gain more insight 
into such a test, let us reconsider the likelihood ratio test of 
Section 10.2, namely that of the equality of the means of 办 distributions. 
There the important quadratic forms are Q, and Q A , which are 
related through the equation Q ~ Q 4 + Qy. That is, 

abS^^i i(Xj-Xy+i i {X i} -X.j) 2 -, 

j = i / i / = I y ** i 

so we see that the total sum of squares, abS 2 , is decomposed into a sum 
of squares, Q 4 , among column means and a sum of squares, Q 3 , within 
columns. The latter sum of squares, divided by n = ab, is the m.l.e. of 
a 2 , provided that the parameters are in Q; and we denote it by Of 
course, S 2 is the m.l^oj^ 2 under co, here denoted by al ，So the 
likelihood ratio A = (o^/^t) abl2 is a monotone function of the statistic 

尸 — QaK^ — 0 

Qi/[K a _ 1)] 

upon which the test of the equality of means is based. 

To help find a test for H 0 : ◎' = 芦 2 = ’ • . = A = 0, where ^ — 
^ + a, ■ + pj, return to the decomposition of Example 3, Section 10.1, 
namely Q = Q 2 + Q 4 + Qs- That is, 

^ = t t (X L - xy + it (Xj - xy 

I =» I j = \ / ~ I / * I 

+ t t - X f . ~ Xj + XJ 2 ; 
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thus the total sum of squares, aftS 2 , is decomposed into that among 
rows (02)，that among columns (Q 4 ), and that remaining (Q s ). It is 
interesting to observe that oj == Q s /ab is the m.l.e. of a 2 under and 




(04 + Qs) 


11 


(m 


db db 

is that estimator under co. A useful monotone function of the likeli- 
hood ratio X = (o^l"o^) abl1 is 


F 


QJ(b - 1 ) 


Qs/[(a - 1)(6 - 1)] 5 


which has, under H 0 , an F-distribution with b — \ and (a — 1)(6 — 1) 
degrees of freedom. The hypothesis H 0 is rejected if F>c, where 
a = Pr (F ^ c; H 0 ). 

If we are to compute the power function of the test, we need 
the distribution of F when H 0 is not true. From Section 10.3 we 
know, when H { is true, that QJa 2 and Q 5 /(t 2 are independent (central 
or noncentral) chi-square variables. We shall compute the non¬ 
centrality parameters of QJa 2 and when H t is true. We have 
= /^ + ot, + Pj, 五 (D = /i + oc,-, E(Xj) = fi Pj and E(X^) — //. 
Accordingly, the noncentrality parameter of Q 4 /<7 2 is 

a Yj ^ + Pj~ a Z ft 2 


a 2 


and that of Q 5 /a 2 is 


q a 

Z X (^ + Oli + — fi — OLi — fi — Pj + n ) 2 


0. 


u 1 


Thus, if the hypothesis H 0 is not true, Fhas a noncentral F-distribution 
with b — 1 and (a — 1)(6 — 1) degrees of freedom and noncentrality 

b 

parameter a ^ /Sy/<r 2 . The desired probabilities can then be found in 

y = * 

tables of the noncentral ^-distribution. 

A similar argument can be used to construct the F needed to test 
the equality of row means; that is, this F is essentially the ratio of the 
sum of squares among rows and Q 5 . In particular, this Fis defined by 

Qilifl — 1 ) 


F 


Qs/[(a - l)(b - 1)] 
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and, under : a, = a 2 = • • • = a 0 = 0, has an F-distribution with 
a — 1 and (a — 1)(6 — 1) degrees of freedom. 

The analysis-of-variance problem that has just been discussed is 
usually referred to as a two-way classification with one observation per 
cell. Each combination of i and j determines a cell; thus there is a total 
of ab cells in this model. Let us now investigate another two-way 
classification problem, but in this case we take c > 1 independent 
observations per cell. 

Let i ~~ 1, 2 f ..., j — 1,2, ... ， b’ 3nd k ~~ 2, •. • ， c ， 

denote n = abc random variables which are independent and which 
have normal distributions with common, but unknown, variance a 2 . 
The mean of each X ijk , k = 1 ， 2, ... ， e，is + 爲 + y,).，where 

q b a b 

I "0, 1^ = 0, Iv iy = : 0, and .[>>,；； = 0. For example, take 

a = 2,6 = 3, // = 5, ai = l,a 2 = — 1,/?, = \,fi 2 = 0,/5 3 = — 1 ， y" = 1 ， 
y l2 = l ， Vi 3 = — 2,y 2 i = — 1 ， V22 = — l,andy 23 = 2. Then the means are 


川 1 

= 8 ， 

叫 2 

= 7, 

川 3 

#21 

= 4 ， 


= 3 ， 

#23 

Note that, if each y tj 

= 0, then 



^ii 

= 7, 

^12 

= 6 ， 

妁 3 


= 5 ， 

^22 

= 4, 

(hi 


That is, if y, y = 0, each of the means in the first row is 2 greater than 
the corresponding mean in the second row. In general, if each y, y = 0, 
the means of row /, differ from the corresponding means of row i 2 by 
a constant. This constant maybe different for different choices of /, and 
i 2 . A similar statement can be made about the means of columns j x and 
j 2 . The parameter is called the interaction associated with cell (/,/). 
That is, the interaction between the /th level of one classification and 
the/th level of the other classification is One interesting hypothesis 
to test is that each interaction is equal to zero. This will now be 
investigated. 

From Exercise 10.4 of Section 10.1 we have that 
t I t (^ - = & I ( 兄 •.一 P ...) 2 + aci (X,J, - xy 

/=iy=i 灸 =I i=l 7* I 

+ci 足 ..) 2 

/ = I y *= I 

+1 z i - a) 2 ; 

/* I y* 1 I 




that is, the total sum of squares is decomposed into that due to row 
differences, that due to column differences, that due to interaction ， and 
that within cells. The test of 

Ho - 7ij = / = 1 ， 2, .. • ， a, j = 1,2,...,^, 

against all possible alternatives is based upon an jpwith (a — l)(b — 1) 
and ab(c — 1) degrees of freedom, 

^ t t A- - - + X.y]l[(a-l)(b-l)] 

i = i y = I / * 

F= - f - m - - 

We - D] 

■ i * 

The reader should verify that the noncentrality parameter of this 

b a 

F-distribution is equal to c yj/^r 2 . Thus F is central when 

" j= I 1 

: ytj = 0, / = 1 ， 2, ... ， a, _/• = 1 ， 2, ... ， A, is true. 

EXERCISES 

10.19. Show that 

I ii^j-x,) 2 = I + + a i; (Xj - xy. 

y a* I / ■> I y — 1 I « I j ■ l 

10.20. If at least one y (j # 0, show that the F, which is used to test that each 
interaction is equal to zero, has noncentrality parameter equal to 

j= I/-1 

10.21. Using the ^ background of the two-way classification with one 
observation per cell.jhowjthat the maximum likelihoodjestimators of a„ 
P p and ^ are a, = X K — X mi = Xj — and fi = 龙 • . ， respectively. 
Show that these are unbiased estimators of their respective parameters 
and compute var (a,), var 0? y ), and var (fi). 

10.22. Prove, using the assumptions of this section, that the linear functions 
Xij — X L — Xj + X„ and Xj — are uncorrelated. 


10.23. Given the following observations associated with a two-way 
classification with a = 3 and ft = 4, compute the F-statistics used to test 
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the equality of the column means (fi { = 馬 =^ = 0) and the equality 

of the row means (a, = a 2 = a 3 = 0) ， respectively. 


Row/Column 

1 

2 

3 

4 

1 

3.1 

4.2 

2.7 

4.9 

2 

2.7 

2.9 

1.8 

3.0 

3 

4.0 

4.6 

3.0 

3.9 


10.24. With the background of the two-way classification with c > 1 
observations per cell, show that the_ maximum^ likelihood estimators 

• of the_ parameters are a, = X,. - P.., ^ = - 歹，々 =- 

X,_ — + 尤 … ,and (i. = X ^. Show that these are unbiased estimators of 

the respective parameters. Compute the variance of each estimator. 

I 

10.25. Given the following observations in a two-way classification with 
a = 3, b = 4, and c = 2, compute the F-statistics used to test that all 
interactions are equal to zero (y,j = 0), all column means are equal (Pj = 0), 
and all row means are equal (a, = 0), respectively. 


Row/Column 

1 

2 

3 

4 

1 

3.1 

4.2 

2.7 

4.9 


2.9 

4.9 

3.2 

4.5 

2 

2.7 

2.9 

1.8 

3.0 


2.9 

2.3 

2.4 

3.7 

3 

4.0 

4.6 

3.0 

3.9 


4.4 

5.0 

2.5 

4.2 


10.6 A Regression Problem 

There is often interest in the relation between two variables, for 
example, a student’s scholastic aptitude test score in mathematics and 
this same student’s grade in calculus. Frequently, one of these 
variables, say x, is known in advance of the other, and hence there is 
interest in predicting a future random variable Y, Since y is a random 
variable, we cannot predict its future observed value Y = y with 
certainty. Thus let us first concentrate on the problem of estimating 
the mean of Y, that is, E{Y). Now E{Y) is usually a function 
of x; for example, in our illustration with the calculus grade, 
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say Y, we would expect E{Y) to increase with increasing mathematics 
aptitude score x. Sometimes E(Y) = }i{x) is assumed to be of a given 
form, such as linear or quadratic or exponential; that is, n(x) could be 
assumed to be equal to a + 芦 ； c or a + 芦 ; c + yx 2 or a〆' To estimate 
E{Y) = or equivalently the parameters a, 芦 ， and y，we observe the 
random variable Y for each of n possibly different values of x, say 
x t ,x 2 ,, x„, which are not all equal. Once the n independent 
experiments have been performed, we have n pairs of known numbers 
(x ( , y,), (x 2 , j 2 ),..., (x Bi y„). These pairs are then used to estimate the 
mean E{Y). Problems like this are often classified under regression 
because E{Y) = is frequently called a regression curve. 

Remark. A model for the mean like a + px + yx 2 , is called a linear model 
because it is linear in the parameters, a, p, and y. Thus a^' is not a linear model 
because it is not linear in a and p. Note that, in Sections 10.1 to 10.4, all the 
means were.linear in the parameters and hence linear models. 

Let us begin with the casein which E(Y) = pl{x) is a linear function. 
The n points are (jc, ，少 _) ，（ jc 2 ， y 2 ),... ，（ jc” ， _v”)；so the first problem 
is that of fitting a straight line to the set of points (see Figure 10.1). 
In addition to assuming that the mean of y is a linear function, 
we assume that, Y 2 ,..., Y n are independent normal variables 
with respective means a + — x), i = 1,2,... ,n, and unknown 

variance a 2 , where x = 1, x-Jn. Their joint p.d.f. is therefore the 



FIGURE 10.1 
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product of the individual probability density functions; that is, the 
likelihood function equals 

1 f Ly； — a — P( x t — ^)] 2 
: exp 


⑽，氏的 =n 


n /2twt 2 


2a 2 


To maximize L(a, P, u 2 ), or, equivalently, to minimize 

5 ； LF, - a - fi(x, - x)] 2 


—In L(a, 芦， (T 2 ) = ■ In (Ina 2 ) 


2a 2 


we must select a and to minimize 

[y, - a - fi(x t - x)] 2 . 

i- I 

Since |y, — a - fi(x! - 3c)| = \y, — is the vertical distance from the 
point (x h yi) to the line y — we note that //(a, fl) represents the 
sum of the squares of those distances. Thus selecting cl and p so that 
the sum of the squares is minimized means that we are fitting the 
straight line to the data by the method of least squares. 

To minimize H(tx, we find the two first partial derivatives 

紐 (a, ^ = 2 £ l>/ - « - - ^)](-l) 


and 


doc 


mu) 




2 X LVi — a — P( X i ~ ^)][ — (^ — ^)]* 


Setting dff(a, p)/da = 0, we obtain 

Y - na - fi Kxi - x) = 0. 

I •» I 1*1 

Since 


we have that 


£ (x, - x) = 0, 


^ y, —nix = 0 


and thus 


6l=Y. 
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The equation dH(a, P)ldfi = 0 yields, with oc replaced by y, 

Z (y> - y>( x i = ^ 


or, equivalently, 




X (r, ~ Y)( Xi - 3E) X Y^x, - x) 


£ (x, - x) 2 


1 (Xi - xf 


To find the maximum likelihood estimator of a 2 , consider the partial 
derivative 


6[-\n L(a, p,a 2 )] „ 


Z Lv, — a — P( x i ~ ^)] 2 


dia 2 ) 2a 2 2(cr 2 ) 2 

Setting this equal to zero and replacing a and by their solutions A 
and /?, we obtain 

^ ^ Z - x)] 2 - 

n i^\ 

Of course, due to invariance, d 1 = a 2 . * 

Since a is a linear function of independent and normally distributed 
random variables, a has a normal distribution with mean 

m = y) = lt EiYi) 


n 


n 


X [a + P(Xi -x)] = ot, 


and variance 


var(a) 


I- (? 


var(r r ) 


n 


The estimator is also a linear function of Y u Y 2 ,... ,Y„ and hence 
has a normal distribution with mean 


域 


I {x t -x)Em 

J =■ 1 

X (Xi - x) 2 
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and variance 


var 


i 


X (x, - x)[ci + P(x, - x)] 

/" = I * 

X - x) 2 

1= I 

<^i(Xi-x) + P (x,- - x ) 2 

i ®= I i = I 

r = 1 

JC ； — X 


p 




var(r,) 


I - 




It can be shown (Exercise 10.27) that 


a 2 


a 2 


I {x, - xf 


% [Yi - a - P(x, - x)] 2 = £ {(d-a) + (fi- p){x { - x) 

+ [F, — a — $(Xj — x)]} 2 

- n(a - a) 2 + - ^) 2 J (x, -x) 2 + n?. 

i- I 

or, for brevity, 

Q — Q\^ m Qi^ m Qi' 

Here Q, Q t , Q 2i and Q 3 are real quadratic forms in the variables 

Y t — a — fi(Xi — 3c), i- 1,2,..., 

In this equation, Q represents the sum of the squares of n independent 
random variables that have normal distributions with means zero and 
variances a 2 . Thus Qja 1 has a chi-square distribution with n degrees 
of freedom. Each of the random variables y/n(a — a)/a and 

对 ( 爲 -P)! a has a normal distribution with zero mean 

and unit variance; thus each of Q l (a 2 and QJa 2 has a chi-square 
distribution with 1 degree of freedom. Since Q 3 is nonnegative, we 
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have, in accordance with the theorem of Section 10.1, that Q u Q 2 , and 
03 are independent, so that Q^/tr 2 has a chi-square distribution with 
n—\ — \ =n — 2 degrees of freedom. Then each of the random 
variables 


Ty 


a — a 


[Jn{d-(x)]la _ 

jQ.jWin - 2 )] _ - 2 ) 


and 


T 2 


't. (Xi - x) 2 (^ - ) 8 ) 


a 




jQ.jWin - 2 )] 


1 I 

厂 " 1 

PI 

(n - 2 ) (x, - xf 

1 _ 


has a /-distribution with n — 2 degrees of freedom. These facts 
enable us to obtain confidence intervals for a and )3. The fact that n^/a 2 
has a chi-square distribution with rt — 2 degrees of freedom provides 
a means of determining a confidence interval for a 2 . These are some 
of the statistical inferences about the parameters to which reference was 
made in the introductory remarks of this section. 


Remark. The more discerning reader should quite properly question 
our constructions of T t and T 2 immediately above. We know that the squares 
of the linear forms are independent of = no 2 , but we do not know, at this 
time, that the linear forms themselves enjoy this independence. This problem 
arises again in Section 10.7. In Exercise 10.47, a more general problem is 
proposed, of which the present case is a special instance. 

EXERCISES 

10.26. Students' scores on the mathematics portion of the ACT examination, 
x, and on the final examination in first-semester calculus \200 points 
possible), y, are given. 

(a) Calculate the least squares regression line for these data. 

(b) Plot the points and the least squares regression line on the same graph. 

(c) Find point estimates for at, 办， and a 1 . 

(d) Find 95 percent confidence intervals for a. and P under the usual 
assumptions. 
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X 

y 

X 

y 

25 

138 

20 

100 

20 

84 

25 

143 

26 

104 

26 

141 

26 

112 

28 

161 

28 

88 

25 

124 

28 

132 

31 

]18 

29 

90 

30 

168 

32 

183 




10.27. Show that 

史 [y, - a - p(Xi - x)] 2 = n(d - a) 2 + (f - ^ (x, - x ) 2 

I B I # / ^ I 

+ Z [ — a — ^(Xi — ^)] 2 . 

i * I 

10.28. Let the independent random variables Y t , Y 2 ,..., Y„ have, 
respectively, the probability denaty functions N(Px h y 2 x^), i = 1,2,... ,n, 
where the given numbers x,, x 2 ,..., are not all equal and no one is zero. 
Find the maximum likelihood estimators of p and y 2 . 

10.29. Let the independent random variables Y lt , Y„ have the joint p.d.f. 

L(a, p, o 2 ) = (^ 5 ) exp Z b/ - « - Pi^i - x)] 2 |, 

where the given numbers x it x 2 , ■.., x„ are not all equal. Let H 0 : = 0 (a 

and o 2 unspecified). It is desired to use a likelihood ratio test to test H 0 
against all possible alternatives. Find A and see whether the test can be based 
on a familiar statistic. 

Hint: In the notation of this section show that 

I (^- a ) 2 = 03 + Z (x, - x) 2 . 

I I 

10.30. Using the notation of Section 10.2, assume that the means Hj satisfy 
a linear function of j, namely fij = c + d[j — {b + 1)/2]. Let independent 
random samples of size a be taken from the b normal distributions with 
common unknown variance o 2 . 

(a) Show that the maximum likelihood estimators of c and d are, 
respectively, d = X.. and 

i u-ib+imiXj-xj 

•j 1 

- • 

^ [j-(b+\)/2] 2 
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(b) Show that 

i t - xy = i t 

I SB I y a= I I = i \ ^ \ It f — 

(c) Argue that the two terms in the right-hand member of part (b), once 
divided by a 2 , are independent random variables with chi-square 
distributions provided that d = 0. 

(d) What F-statistic would be used to test the equality of the means, that 
is, H 0 :d = 0 ? 

10.7 A Test of Independence 

Let ^ and y have a bivariate normal distribution with means 川 and 
H 2 , positive variances <rj and and correlation coefficient p. We wish 
to test the hypothesis that A" and Y are independent. Because two jointly 
normally distributed random variables are independent if and only 
if p = 0, we test the hypothesis H Q : p = 0 against the hypothesis 
/f,: p / 0. A likelihood ratio test will be used. Let (X iy 7|), 
(X 2 , Y z ),..., (X„, Y n ) denote a random sample of size n> 2 from the 
bivariate normal distribution; that is, the joint p.d.f. of these In random 
variables is given by 

/( 文 I ， y\)f(x 2 , y 2 ) ■' f{x„,y n ). 

Although it is fairly difficult to show, the statistic that is defined by the 
likelihood ratio A is a function of the statistic 

I - Y) 

R= I a ' 

i (y,- fy 

V I ■= I 1= I 

This statistic R is called the correlation coefficient of the random 
sample. The likelihood ratio principle, which calls for the rejection 
of H 0 i( X< Xq, is equivalent to the computed value of |i?| > c. That 
is, if the absolute value of the correlation coefficient of the sample 
is too large, we reject the hypothesis that the correlation coefficient 
of the distribution is equal to zero. To determine a value of c for 
a satisfactory significance level, it will be necessary to obtain the 
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distribution of R, or a function of R, when H 0 is true. This will now 
be done. 

Let A", = x t , X 2 = x 2 ^.. ：, X n = x n ,n > 2, where jc ( , jc 2 , ..., jc„ and 

_ n if 

x = x ii n are fixed numbers such that (x, — 3c) 2 > 0. Consider the 

I i 

conditional p.d.f. of Y t , Y 2 ,... y Y n , given that X t = x t , X 2 — 
x 2 , … ， X„ = x”. Because Y u Y 2 ,..., Y n are independent and, with 
p = 0, are also independent of A"_ ， A" 2 ,... ， X„, this conditional p.d.f. 
is given by — 

Z (乃 -a) 2 
' 2? 2 ~~ . 

Let R c be the correlation coefficient, given =*= x u X 2 = 
x 2f ..., X„ = x n , so that 



RcJl (Y i - Yf X ( Xl - x)(Y, -Y) 


£ (Xi - x ) 2 




I ^ - X ) 2 


is like 4 of Section 10.6 and has mean zero when p = 0. Thus, referring 
to T 2 of Section 10,6, we see that 


Rcy/nYi- F) 2 A/S(x,- 又 ) 2 


Z{r f - Y-Yfly/^-xf) ( Xi -J )} 1 ^ 


RcsJyi — 2 


(n- 2)^ - 3c ) 2 


( 1 ) 


has, given X t = x { ,..., X n = x„, a conditional /-distribution with 
n — 2 degrees of freedom. Note that the p.d.f., say g(/), of this 
/-distribution does not depend upon jci, x 2> ..., x„. Now the joint 
p.d.f. of A",, , X n and K-Jn — 2j^J\ — R 2 , where 


R 


不一 P)(r, — F) 

1 


is the product of g(r) and the joint p.d.f. of A",, ^ 2 , ..., X„. 
Integration on x ls jc 2 , ..., yields the marginal p.d.f. of 
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Rsjn — l\yj\ - jR 2 ; because g(t) does not depend upon x ]f x 2 ,... ,x„ 
it is obvious that this marginal p.d.f. is ^(/), the conditional p.d.f. of 
R cy /n — 2/^/1 — R 2 C . The change-of-variable technique can now be 
used to find the p.d.f. of R. 

Remarks. Since R has, when /? = 0, a conditional distribution that does 
not depend upon x,, x 2 ,... ,x H (and hence that conditional distribution is, in 
fact, the marginal distribution of R), we have the remarkable fact that R is 
independent of ^i, , X H . It follows that R is independent of every 

function o(X u X 2 ,..., X„ alone, that is, a function that does not depend upon 
any Y,. In like manner, R is independent of every function of Y\, Y 2 ,..., Y n 
alone. Moreover, a careful review of the argument reveals that nowhere did 
we use the fact that X has a normal marginal distribution. Thus, if X and Y 
are independent, and if Y has a normal distribution, then R has the same 
conditional distribution whatever be the distribution of X, subject to the 

condition E (jc,. — 3c) 2 > 0. Moreover, if Pr (A", — X) 2 > 0 =1， then R 
I |_ i J 

has the same marginal distribution whatever be the distribution of X. 

If we write r = R-Jn — 2\yj\ — R\ where T has a ^distribution 
with n — 2 > 0 degrees of freedom, it is easy to show, by the 
change-of-variable technique (Exercise 10.34 )， that the p.d.f. of R is 
given by 




r[(” 一 1)/2] 

_ r(i)r[(« - 2)/2] 

= 0 elsewhere. 


(1 - 


-1 < r < 1 ， 


( 2 ) 


We have now solved the problem of the distribution of H, 
when p = 0 and n> 2, or，perhaps more conveniently, that of 

Ry/n — 2/^/1 — B 2 . The likelihood ratio test of the hypothesis 
H 0 : p = 0 against all alternatives //,:/?# 0 maybe based either on the 

statistic R or on the statistic Ryjn — 2/^/1 — R 2 = 7, although the 
latter is easier to use. In either case the significance level of the test is 

a = Pr (|/?| 乏 //o) = Pr (|7] > C 2 ； Hq), 

where the constants C\ and c 2 are chosen so as to give the desired value 

of CL. 

Remark. It is also possible to obtain an approximate test of size a by using 
the fact that 
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has an approximate normal distribution with mean 5 In [(1 + p)/(\ — p)] and. 
variance 1 /(n — 3). We accept this statement without proof. Thus a test of 
H 0 : p = 0 can be based on the statistic 

7 — 士 l n [(1 + 及 )/(1 — 及)] — 士 l n [(1 + P )/0 - P )1 

一 v /" ("二 3) ' 

with p = 0 so that 5 In [(1 + p)/(l — p)\ = 0. However, using fV, we can 
also test hypotheses like H 0 : p = p 0 against H'\ p # p 0 , where p 0 is not 
necessarily zero. In that case the hypothesized mean of W is 



㈣) 


EXERCISES 

10.31. Show that 


X (jt,- 卽 , .-P) 


R 


Y^X^-nXY 


/(ZAf-nJP 2 


10.32. A random sample of size rt — 6 from a bivariate normal distribution 
yields a value of the correlation coefficient of 0.89. Would we accept or 
reject, at the 5 percent signficance level, the hypothesis that /? = 0? 

10.33. Verify Equation (1) of this section. 

10.34. Verify the p.d.f. (2) of this section. 


10.8 The Distributions of Certain Quadratic Forms 

Remark. It is essential that the reader have the background of the 
multivariate normal distribution as given in Section 4.10 to understand 
Sections 10.8 and 10.9. 

Let X h i— \,2,... ,n, denote independent random variables 
which are <rj), i = 1, 2,..., w, respectively. Then 

Q = Y, ~ 从 ) 2 /°f is X 2 ( n )- Now Qis a quadratic form in the X, — fi, 

1 

and Q is seen to be, apart from the coefficient —the random variable 
which is defined by the exponent on the number e in the joint 
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p.d.f. of A",, ^ 2 ,, X„. We shall now show that this result can be 
generalized. 

Let A",, X 2 ,..., X n have a multivariate normal distribution with 
p.d.f. 

i [ (x - iiyv-'cx - n)" 

W^ eXP L"^ 2 」， 

where, as usual, the covariance matrix V is positive definite. We shall 
show that the random variable Q (a quadratic form in the A", — 从)， 
which is defined by (x — n)'V _l (x — n )， is x 2 ( n )- We have for the 
m.g.f. M(t) of Q the integral 

• oo mao 1 

J-00 j-OoWV^l 


x exp 


《x - nyv-'(x - n)- 


(x-nyVCx-H)' 

2 


/*oo 


x exp 


0 (27cr /2 vTvi 

(x — nyv -'(X - >1)(1 - It) 
2 


私 … dx, 


n 


dx t … dx„. 


With V -1 positive definite, the integral is seen to exist for all real values 
of t <\. Moreover, (1 — 2t)\~\ f < |, is a positive definite matrix 
and, since |(1 - 2t)\~ l \ = (1 - 2/) fl |V-'|, it follows that 

1 「 (x - nXV-'Cx - 11)(1 - 2/)1 

(27c) fl Vlvi/(i - 2 ty L 2 」 


can be treated as a multivariate normal p.d.f. If we multiply our 
integrand by (1 — 2/) n/2 , we have this multivariate p.d.f. Thus the 
m.g.f. of Q is given by 


M{t) 


1 


(i - 2ty 121 


t < 


and Q is x 2 («), as we wished to show. This fact is the basis of the 
chi-square tests that were discussed in Chapter 6. 

The remarkable fact that the random variable which is defined by 


(x — n)'V _l (x _ n) is x 2 (w) stimulates a number of questions about 
quadratic forms in normally distributed variables. We would like to 
treat this problem in complete generality, but limitations of space 
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forbid this, and we find it necessary to restrict ourselves to some special 
cases. 

Let X lf X 2 ,..., X„ denote a random sample of size n from a 
distribution which is A^O, <r 2 ) ， a 2 > 0. Let X f = [X t , X 2 ,..., X„] and let 
A denote an arbitrary/i x n real symmetric matrix. We shall investigate 
the distribution of the quadratic form X'AX. For instance, we know 

that Xl„X/o- 2 = X’X/ff 2 = [ Xf/a 2 is x\ri). First we shall find the 

I 

m.g.f.of X'AX/a 2 . Then we shall investigate the conditions that must 
be imposed upon the real symmetric matrix A if X'AX/<r 2 is to have a 
chi-square distribution. This m.g.f. is given by 


M(t) 




exp 


r tx f \x x，x 


户 oo 




exp 


K(7 


a 2 2a 2 

x，(I — 2rA)x 
2(t 2 


dx' … dx„ 


dx x - dx„. 


where I = I„. The matrix I — 2tA is positive definite if we take |/| 
sufficiently small, say |/| < h, h>0. Moreover, we can treat 


exp 


x'(I - 2 t\)\ 
la 1 


(2t C )" / VI( i - 2/A)- l ^| 

as a multivariate normal p.d.f. Now |(I — 2tA)~'(r 2 \ ll2 = <r"/|I — 2/A| l/2 . 
If we multiply our integrand by |I — 2/A| l/2 , we have this multivariate 
p.d.f. Hence the m.g.f. of X'AX/a 2 is given by 

M(t) = |I - 2tA\- 112 , |/| < h. (1) 

It proves useful to express this m.g.f. in a different form. To do this, 
let a { , a 2 ,..., a„ denote the characteristic numbers of A and let L 
denote an n x n orthogonal matrix such that L’AL = 
diag [ai,a 2 ,, a„]. Thus 


1/(1 - 2/A)L 


1 _ ItQi 


0 


0 


2taj 


0 


0 

0 

1 — 2 ta„ 


Then 


fl (1 - 2 叫） = \V(l - 2t\)L\ = |I - 2tA\. 
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Accordingly, we can write M(t), as given in Equation (1), in the form 


M(t)= lid - 2ta >) ， kl < h. ( 2 ) 

(=I 


Let r, 0 < r <, n, denote the rank of the real : symmetric matrix A. 
Then exactly r of the real numbers a u a 2 ,, a„, say a t ,... ,a r , are 
not zero and exactly n — rof these numbers, say a r + u ... ,a„, are zero. 
Thus we can write the m.g.f. of X'AX/<r 2 as 

M{t) = [(1 - 2 卟 ）(1 一 2ta 2 ) • • • (1 - 2ta r )]-^. 

Now that we have found, in suitable form, the m.g.f. of our random 
variable, let us turn to the question of the conditions that must be 
imposed if X’AX/o 2 is to have a chi-square distribution. Assume that 
X'AX/tr 2 is x\k). Then ^ 

M(t) = [(1 - 2/a,)(l - 2ta 2 ) ‘. _ (1 - 2ta r )]~ l/2 = (1 - 
or, equivalently, 

(1 — 2 ffl|)(l 一 2 如 2 ) • • ■ (1 — 2/fl r ) = (1 — l.t) k ^ |f| < h. 

Because the positive integers r and k are the degrees of these 
polynomials, and because these polynomials are equal for infinitely 
many values of t, we have k = r, the rank of A. Moreover, the 
uniqueness of the factorization of a polynomial implies that 
a x = a 2 = '' • = a r = \.\{ each of the nonzero characteristic numbers 
of a real symmetric matrix is one, the matrix is idempotent, that is, 
A 2 = A, and conversely (see Exercise 10.38). Accordingly, if X'AX/tr 2 
has a chi-square distribution, then A 2 = A and the random variable is 
X 2 (r), where r is the rank of A. Conversely, if A is of rank r,0 < r < n, 
and if A 2 = A, then A has exactly r characteristic numbers that are 
equal to one, and the remaining n — r characteristic numbers are equal 
to zero. Thus the m.g.f. of X'AX/tr 2 is given by (1 — 2ty r,2 , ' < and 
X'AX/cr 2 is x 2 i r )- This establishes the following theorem. 

Theorem 2. Let Q denote a random variable which is a quadratic 
form in the observations of a random sample of size n from a distribution 
which is N(0, tr 2 ). Let A denote the symmetric matrix of Q and let r, 
0 < r <n, denote the rank of A. Then Q/a 2 is x\r) if and only if A 2 = A. 

Remark. If the normal distribution in Theorem 2 is N{n, a 1 ), the condition 
A 2 = A remains a necessary and sufficient condition that Q/a 2 have a 
chi-square distribution. In general, however, Qla 2 is not /(r) but, instead, 
QI& 1 has a noncentral chi-square distribution if A 2 = A. The number 
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of degrees of freedom is r, the rank of A, and the noncentrality parameter is 
li'Aji/ff 2 , where ^ = [fi, n,..., ^]. Since = n 2 Y a u ，where A = [%], 

then, if /i # 0, the conditions A 2 = A and E tf, y = 0 are necessary and sufficient 

u 

conditions that Q/a 1 be central ^ (r). Moreover, the theorem may be extended 
to a quadratic form in random variables which have a multivariate normal 
distribution with positive definite covariance matrix V; here the necessary and 
sufficient condition that Q have a chi-square distribution is AVA = A. 

EXERCISES 

10.35. Let Q = X { X 2 — where X t , X 2 , is a random sample of size 

4 from a distribution which is jV(0, a 1 ). Show that Q/a 2 does not have a 
chi-square distribution. Find the m.g.f. of Qla 2 . - 

10.36. Let X’ = [H] be bivariate normal with matrix of means 
I*'= [ 川， ju 2 ] and positive definite covariance matrix V. Let 

A X x X 2 XI 

1 ff|(l - P 2 ) P <T|<T 2 (1 - p 2 ) + a](l - p 2 )' 

Show that Qi is y^{r, 6) and find r and 0. When and only when does Q, have 
a central chi-square distribution? 

10.37. Let X' = [X u X 7 , denote a random sample of size 3 from a 
distribution that is A^(4, 8 ) and let 



Justify the assertion that X'AX/a 2 is x 2 (2, 6). 

10.38. Let A be a real symmetric matrix. Prove that each of the nonzero 
characteristic numbers of A is equal to 1 if and only if A 2 = A. 

Hint: Let L be an orthogonal matrix such that UAL — 
diag [a,, a 2 , ..., a„] and note that A is idempotent if and only if L'AL is 
idempotent. 

10.39. The sum of the elements on the principal diagonal of a square matrix 
A is called the trace of A and is denoted by tr A. 

(a) If B is « x w and C is w x n, prove that tr (BC) = tr (CB). 

(b) If A is a square matrix and if L is an orthogonal matrix, use the result 
of part (a) to show that tr (L'AL) = tr A. 

(c) If A is a real symmetric idempotent matrix, use the result of part (b) 
to prove that the rank of A is equal to tr A. 
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10.40. Let A = [«, 7 ] be a real symmetric matrix. Prove that H 4 is equal to 

J i 

the sum of the squares of the characteristic numbers of A. 

Hint: If L is an orthogonal matrix, show that [ [ <4.= 
tr (A 2 ) = tr (L^A 2 L) = tr [(L / AL)(L , AL)]. " 

10.41. Let X and S 1 denote, respectively, the mean and the variance of a 
random sample of size n from a distribution which is N(0, a 2 ). 

(a) If A denotes the symmetric matrix of nX 2 , show that A = (l/n)P, where 
P is the n x n matrix, each of whose elements is equal to one. — 

(b) Demonstrate that A is idempotent and that the tr A = 1. Thus nX 2 ja 1 
is，(”■ 

(c) Show that the symmetric matrix B of nS 2 is I — (l/n)P. 

(d) Demonstrate that B is idempotent and that tr B = /? — 1. Thus nS 2 ^ 1 
is x 2 (n — 1), as previously proved otherwise. 

(e) Show that the product matrix AB is the zero matrix. 


10.9 The Independence of Certain Quadratic Forms 


We have previously investigated the independence of linear 
functions of normally distributed variables (see Exercise 4.132). In this 
section we shall prove some theorems about the independence of 
quadratic forms. As we remarked on p. 483, we shall confine our 
attention to normally distributed variables that constitute a random 
sample of size n from a distribution that is A^O, a 2 ). 

Let X u X 2 ,..., X„ denote a random sample of size n from a 
distribution which is N(Q, a 2 ). Let A and B denote two real symmetric 
matrices, each of order n. Let X 7 = [H ... ， X n ] and consider the 
two quadratic forms X'AX and X'BX. We wish to show that these 
quadratic forms are independent if and only if AB = 0, the zero matrix. 
We shall first compute the m.g.f. M{t u t 2 ) of X'AX/a 2 and X'BX/a 2 . 
We have 

t 2 )= 



exp 


)iX’Ax /jx'Bx x r x\ , , 

- 7 ]dx r -dx n 


(T 


a 2 2a\ 




exp 


x 7 (I -2/, A- 2/ 2 B)x 、 

2 ? 


dx t dx„. 



Sec. 10.9| The Independence of Certain Quadratic Forms 


487 


The matrix I — 2/, A — 2/ 2 B is positive definite if we take |ftl and |/ 2 | 
sufficiently small, say |/,| < 厶 |， Ihl < hi, where h u h 2 > 0. Then, as on 
p. 483, we have 


M(/ 1 ,/ 2 ) = |I-2/ 1 A-2f 2 B|- 1 / 2 , \t x \<h u \t 2 \ < h 2 . 

Let us assume that X'AX/tr 2 and X'BX/tr 2 are independent (so that 
likewise are X'AX and X'BX) and prove that AB = 0. Thus we assume 
that ， 

M(/„/ 2 ) = M(/„0)M(0, t 2 ) (1) 

for all and t 2 for which |/,| < h h i = 1, 2. Identity (1) is equivalent to 
the identity 


|I -2/,A- 2/ 2 B| = |I - 2/,A||I - 2/ 2 B|, \t { \< h h / =1,2. (2) 

Let r > 0 denote the rank of A and let a u a 2i ... ,a r denote the r 
nonzero characteristic numbers of A. There exists an orthogonal 
matrix L such that 


LAL = 


a, 0 … 0 

0 a 2 •- 0 


0 


0 0 


Or 


0 


0 J 


c, 

6 


!] = c 

I 


for a suitable ordering of a,, a 2 ,..., a r . Then L'BL may be written 
in the identically partitioned form 


LBL = 


■] = D 


The identity (2) may be written as 

|L1|I - 2/, A- 2/ 2 B||L| = |L1|I - 2/,A||L||L1|I - 2/ 2 B||L|, (T) 

or as * 

|I- 2/,C- 2/ 2 D| = |I- 2/,C||I- 2/ 2 D|. (3) 

The coefficient of (—2/|) r in the right-hand member of Equation (3) is 
seen by inspection to be . a r \l — 2/ 2 D|. It is not so easy to find 

the coefficient of ( —2/,) r in the left-hand member of Equation (3). 
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Conceive of expanding this determinant in terms of minors of order r 
formed from the first r columns. One term in this expansion is the 
product of the minor of order r in the upper left-hand comer, namely, 
|I r — 2r,C n — 2/ 2 D|,|, and the minor of order « — r in the lower 
right-hand corner, namely, |I„ _ r — 2/ 2 D 22 |. Moreover, this product is 
the only term in the expansion of the determinant that involves 
(—2/,) r . Thus the coefficient of ( —2/,) r in the left-hand member of 
Equation (3) isa\a 2 - - - a r |I„_ r — 2f 2 D 22 |. If we equate these coefficients 
of ( —2/,) r , we have, for all / 2 , \h\ < h 2 , 

|I-2/ 2 D| = |I„_ r -2/ 2 D 22 |. (4) 

Equation (4) implies that the nonzero characteristic numbers of the 
matrices D and D 22 are the same (see Exercise 10.49). Recall that the 
sum of the squares of the characteristic numbers of a symmetric matrix 
is equal to the sum of the squares of the elements of that matrix (see 
Exercise 10.40). Thus the sum of the squares of the elements of matrix 
D is equal to the sum of the squares of the elements of D 22 . Since the 
elements of the matrix D are real, it follows that each of the elements 
of D,|, D| 2 , and D 2 , is zero. Accordingly, we can write D in the form 

D = L BL = f^-|」-. 

‘0 丨 D 22 

Thus CD = L'ALL'BL = 0 and L'ABL = 0 and AB = 0, as we wished 
to prove. 

To complete the proof of the theorem, we assume that AB = 0. We 
are to show that X"AX/<j 2 and X'BX/o 2 are independent. We have, for 
all real values of /, and t 2 , 

(I — 2/1 A)(I — 2/2®) ~ I _ 2/1A — 2,2 ®， 
since AB = 0. Thus 

|I- 2/, A- 2/ 2 B| = |I- 2/,A||I - 2/ 2 B|. 

Since the m.g.f. of X'AX/cr 2 and X'BX/o -2 is given by 

M(/„ t 2 ) = |I-2/,A- 2/ 2 B|- ,/2 , |/,| < 1,2, 

we have 

M(/ 1 ,/ 2 ) = M(/ 1 ,0)M(0,/ 2 ), 
and the proof of the following theorem is complete. 
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Theorem 3. Let Q, and Q 2 denote random variables which are 
quadratic forms in the observations of a random sample of size n from 
a distribution which is N(0, a 1 ). Let A. andB denote, respectively, the real 
symmetric matrices of Q { and Q 2 . The random variables Q x and Q 2 are 
independent if and only if AB = 0. 

Remark. Theorem 3 remains valid if the random sample is from a 
distribution which is N(fi, a 2 ), whatever be the real value of /i. Moreover, 
Theorem 2 maybe extended to quadratic forms in random variables that have 
a joint multivariate normal distribution with a positive definite covariance 
matrix V. The necessary and sufficient condition for the independence 
of two such quadratic forms with symmetric matrices A and B then 
becomes AVB = 0. In our Theorem 2, we have V = so that 
AVB = Ao^IB = <HAB = 0. 


We shall next prove Theorem 1 that was stated in Section 10.1. 

Theorem 4. Let Q = Q\ + • + Qk-i + Qk, where Q, 

Q . ,Qk-i,Qk ore k + 1 random variables that are quadratic forms 

in the observations of a random sample of size n from a distribution which 
is N(0, a 2 ). Let Q/a 2 be x 2 (r), let Q ； ja 2 be ^ 2 (r ( ), / = \,2,... ,k — hand 
let Q k be nonnegative. Then the random variables Q } ,Q 2 ,... ,Q k are 
independent and, hence, Qkl<r 2 is x 2 (a = r — T| — ■ • ■ — _ i). 


Proof. Take first the case oi k = 2 and let the real symmetric 
matrices of Q, Q u and Q 2 be denoted, respectively, by A, A,, A 2 . We 
are given that Q = Q x + Q 2 ot, equivalently, that A = Aj + A 2 . We are 
also given that Qja 1 is x\r) and that Q } /<t 2 is x 2 (f\)- accordance with 
Theorem 2, p. 484, we have A 2 = A and A? = A,. Since Q 2 > 0, each 
of the matrices A, A,, and A 2 is positive semidefinite. Because A 2 = A, 
we can find an orthogonal matrix L such that 

l'al= 「 4 0 4 

Lo ! oj 

If then we multiply both members of A = A, + A 2 on the left by U and 
on the right by L, we have 



° - = LA,L + LA 2 L. 

0 _ 


Now each of A, and A 2 , and hence each of L'A, Land L'A 2 L is positive 
semidefinite. Recall that, if a real symmetric matrix is positive 
semidefinite, each element on the principal diagonal is positive or 
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zero. Moreover, if an element on the principal diagonal is zero, then 
all elements in that row and all elements in that column are zero. Thus 
L'AL = + L’A 2 L can be written as 


X ： 01 = fG r ； O' 

bTdJ _ Lo^o_ 

Since A? = A, , we have 

(L^.L ) 2 = LAiL 


H r 

： o _ 

_0 

!o„ 

~G r \ 

0~ 

_o ! 

0_ 


(5) 


If we multiply both members of Equation (5) on the left by the matrix 
we see that 


~G/：01 


G, 

!.o" 

4 - 


0 " 

Lo loj 


_0 

!o_ 

丁 

_ V 



or, equivalently, L y A,L = L , A 1 L + (L'A^XL^L). Thus (L'AiL) x 
(I/A 2 L) = 0 and A, A 2 = 0. In accordance with Theorem 3, Q { and Q 2 
are independent. This independence immediately implies that Q 2 /ff 2 is 
^(r 2 — r — r,). This completes the proof when k = 2. For k> 2, the 
proof may be made by induction. We shall merely indicate how this 
can be done by using fc = 3. Take A = + A 2 + A 3 , where A 2 = A, 

A] = A t , \\ = A 2 , and A 3 is positive semidefinite. Write 
A = A, + (A 2 + A 3 ) = A, + B,, say. Now A 2 = A, A? = A,, and B, is 
positive semidefinite. In accordance with the case of k = 2, we have 
A|B, = 0, so that Bj = B,. With B, = A 2 + A,, where Bf = B ( , 
Aj = A 2 , it follows from the case of k = 2 that A 2 A 3 = 0 and A 3 = A 3 . 
If we regroup by writing A = A 2 + (A, + A 3 ), we obtain A,A 3 = 0, and 
so on. 


Remark. In our statement of Theorem 4 we took X\,X 2 ,..., A*；, to be 
observations of a random sample from a distribution which is ^(0, a 2 ). We 
did this because our proof of Theorem 3 was restricted to that case. In fact, 
if Q\ Q\,... ,Q' k are quadratic forms in any normal variables (including 
multivariate normal variables), iiQ' = Q\ Q' k , if Q\ Q\, • • Qk - \ 

are central or noncentral chi-square, and if Q[ is nonnegative, then Q\,... ,Q'k 
are independent and Q' k is either central or noncentral chi-square. 

This section will conclude with a proof of a frequently quoted 
theorem due to Cochran. 
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Theorem 5. Let X { , X 2 ,..., X n denote a random sample from a 
distribution which is N(0, a 2 ). Let the sum of the squares of these 
observations be written in the form 

Z^=0l + 02+---+0*, 

1 

where Qj is a quadratic form in X u X 2 ,..., X„ t with matrix A ； which 
has rank rj, j = \,2,.. . ,k. The random variables Q u Q 2 , ■ ■ ■, Qk are 

k 

independent and Qjja 2 is x 2 (r y ), j = \,2 y k, if and only ifY, r ; = n - 

k n * k 

Proof. First assume the two conditions ^ r ； = n and = Y, Qj 

i I I 

to be satisfied. The latter equation implies that I = A| + 
A 2 + •. • + A*. Let B, = I — A,. That is, B, is the sum of the matrices 

A, ， … ， A* exclusive of A,. Let denote the rank of B,. Since the rank 
of the sum of several matrices is less than or equal to the sum of the 

ranks, we have R, < J] r y — r,- = « — r,. However, I = A ； + B ( , so that 

I 、 

n <r, + Rj and « — r, < /?,. Hence R t = n ~ r ( . The characteristic 
numbers of B, are the roots of the equation |B,- — AI| = 0. Since 

B, = I — A„ this equation can be written as |I — A ( — AI| = 0. Thus we 
have I A, — (1 — A)I| = 0. But each root of the last equation is one minus 
a characteristic number of A,-. Since B, has exactly n — R, = r, 
characteristic numbers that are zero, then A, has exactly r, characteristic 
numbers that are equal to 1. However, r ； is the rank of A ( . Thus each 
of the r, nonzero characteristic numbers of A, is 1. That is, Af = A, and 
thus Qija 2 is x\ri), i = 1, 2,..., In accordance with Theorem 4, the 
random variables Q\^Qi, ■ ■ ■, Qk are independent. 

To complete the proof of Theorem 5, take 

E X = + 02 + • • • + 0 *， 

let 0i,02 ， ... ，0 * be independent, and let Qjja 1 be 

j = \,2, … ， k. Then 石 O 2 is 义 2 ( 石 ◊ But 石 0;/<r 2 = 零 is 

X 2 (n). Thus Ya r i = n and the proof is complete. 

i ■ 

EXERCISES 

10.42. Let X u X 2 , A" 3 be a random sample from the normal distribution 
A^(0, o 2 ). Are the quadratic forms X] + 2>X x X 2 + X\ + X t X 3 + and 
— IX x X 2 + ]X] — 2X t X 3 — X] independent or dependent? 
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10.43. Let X t , X 2 ,..., X„ denote a random sample of size n from a 

X 

distribution which is A^(0, a 2 ). Prove that Xj and every quadratic form, 

I 

which is nonidentically zero in A",, , X„, are dependent. 

10.44. Let X x , X 2 , X^, X 4 denote a random sample of size 4 from a 

4 

distribution which is <r 2 ). Let Y=Y, a >^i> where a,, a 2 , a 3 , and a 4 are 

I 

real constants. If Y 2 and Q = X\X 2 — X 3 X 4 are independent, determine a t , 

O2 > Oj, and q 卑, 

10.45. Let A be the real symmetric matrix of a quadratic form Q in the 
observations of a random sample of size n from a distribution which is 
N(Q ， a 7 ). Given that Q and the mean X of the sample are independent. 
What can be said of the elements of each row (column) of A? 

Hint: Are Q and X 1 independent? 

10.46. Let A,,A 2> ...,A* be the matrices of A ： > 2 quadratic forms 
Q\^Qi, ■ > ,Qk in the observations of a random sample of size n from a 
distribution which is N(0, a 2 ). Prove that the pairwise independence of these 
forms implies that they are mutually independent. 

Hint: Show that A.-Aj, = 0, i 爹 j, permits £[exp (/ 1 0i + 
t 2 Q 2 + • •. + t k Q k )] to be written as a product of the moment-generating 
functions of Q,, Q 2 ,..., Q k . 

10.47. Let X' — [X u X 2 ,. .., X„], where X t , X 2 ,..., X„ are observations 
of a random sample from a distribution which is N(0, o 2 ). Let 
b' = [bi, b 2 ,..., b„] be a real nonzero matrix, and let A be a real symmetric 
matrix of order n. Prove that the linear form b'X and the quadratic form 
X'AX are independent if and only if b'A = 0. Use this fact to prove that 
b'X and X'AX are independent if and only if the two quadratic forms, 
(b’X) 2 = X'bb'X and X'AX, are independent. 

10.48. Let (2, and 0 2 be two nonnegative quadratic forms in the observations 
of a random sample from a distribution which is N(0, a 2 ). Show that 
another quadratic form Q is independent of + Q 2 if and only if Q is 
independent of each of Q t and Q 2 . 

Hint: Consider the orthogonal transformation that diagonalizes the 
matrix of + Qi- After this transformation, what are the forms of the 
matrices of Q, Q { , and Q 2 if Q and Q, + Q 2 are independent? 

10.49. Prove that Equation (4) of this section implies that the nonzero 
characteristic numbers of the matrices D and D 22 are the same. 

Hint: Let X = 1 /(2/ 2 ), t 2 ^ 0, and show that Equation (4) is equivalent 
to|D-AI| = (-Ay|D 22 -AI n _ f |. 

10.50. Here 仏 and Q 2 are quadratic forms in observations of a random 
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sample from iV(0, 1). If Q, and Q 2 are independent and if Q, + Q 2 has a 
chi-square distribution, prove that Q y and Q 2 are chi-square variables. 

10.51. Often in regression the mean of the random variable y is a linear 

function of /^-values x,, x 2 ,..., x p , say + 月 2 x 2 + • ‘ - + ^ p x p , where 

P' = (P' ， P 2 , … ， Pp) are the regression coefficients. Suppose that n values, 
Y' = (y,, y 2 , • •. ， y„), are observed for the 欠 -values in X = (x y ), where X 
is an n x p design matrix and its /th row is associated with 
Y h i = 1, 2, ..., Assume that Y is multivariate normal with mean Xp 
and covariance matrix ct 2 I, where I is the ti 乂 n identity matrix. 

(a) Note that Y { , Y 2 ,..., Y„ are independent. Why? 

(b) Since Y should approximately equal its mean Xp, we estimate p by 
solving the normal equations X'Y = X'Xp for p. Assuming that X'X is 
nonsingular, solve the equations to get | = (X'X)—H. Show that P 
has a multivariate normal distribution with mean p and covariance 
matrix ff 2 (X'Xy l . 

(c) Show that 

(Y- X(JX(Y - X|») = (p- p) / (X / X)(p -fi) + (Y -X 的 ' (Y - Xp), 

say 0 + 02 for convenience. 

(d) Show that Q { /a 2 is x 2 (P)- 

(e) Show that Q, and Q 2 are independent. 

(0 Argue that Q 7 ja 7 is - P)- 

(g) Find c so that cQ t /Q 2 has an /"-distribution. 

(h) The fact that a value d can be found so that Pr (cQ t /Q 2 < cf) = l — a. 
could be used to find a 100(1 — a) percent confidence ellipsoid for p. 
Explain. 

(i) If the coefficient matrix P has the prior distribution that is multivariate 
normal with mean matrix Pq and covariance matrix E 0 , what is the 
posterior distribution of P, given P? 

10.52. Say that G.P.A. (Fjis thought to be a linear function of a “coded” high 

school rank (x 2 ) and a “coded” American College Testing score (jc 3 ), 
namely ， 办 _ + p 2 x z + Note that all values equal ). We observe the 

following five points: 



Xi 

太 3 

Y 

1 

1 

2 

3 

1 

4 

3 

6 

1 

2 

2 

4 

1 

4 

2 

4 

1 

3 

2 

4 


⑻ Compute X X and p = (X'XK’X'Y. 

(b) Compute a 95 percent confidence ellipsoid for P' = (H A )， 



(a) Use the 5 percent significance level and test H A :. a, = a 2 = a 3 0 

against all alternatives. • ' 

(b) Use the 5 percent significance level and test ff a : P 2 = 

= 0 against all alternatives. 
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ADDITIONAL EXERCISES 

10.53. Let /i,, fi 3 be, respectively, the means of three normal distributions 
with a common but unknown variance a 2 . In order to test, at the a = 5 
percent significance level, the hypothesis H 0 : Hi = n 2 = M 3 against all 
possible alternative hypotheses, we take an independent random sample of 
size 4 from each of these distributions. Determine whether we accept or 
reject H 0 if the observed values from these three distributions are, 
respectively, 

JiT,: 5 9 6 8 

X 2 : 11 13 10 12 

X 3 : 10 6 9 9 

10.54. The driver of a diesel-powered automobile decided to test the quality 
of three types of diesel fuel sold in the area based on mpg. Test the null 
hypothesis that the three means are equal using the following data. Make 
the usual assumptions and take a = 0.05. 

Brand A: 38.7 39.2 40.1 38.9 

Brand B: 41.9 42.3 41.3 

Brand C: 40.8 41.2 39.5 38.9 40.3 

10.55. We wish to compare compressive strengths of concrete corresponding 
to a = 3 different drying methods (treatments). Concrete is mixed in batches 
that are just large enough to produce three cylinders. Although care is 
taken to achieve uniformity, we expect some variability among the 6 = 5 
batches used to obtain the following compressive strengths. (There is little 
reason to suspect interaction and hence only one observation is taken in 
each cell.) 

Batch 

Treatment B t B 2 B 、 B 4 B s 


424338 


515244 


444945 


475548 


2 0 6 
5 6 5 


I 2 3 




Sec. 10.9] The Independence of Certain Quadratic Forms 


495 


10.56. With a = 3 and 6 = 4, find n, a„ and y iJt if piy, i = 1,2, 3 and 
j = 1, 2, 3, 4, are given by 

6 7 7 12 

10 3 11 8 

8 5 9 10 

t . \ . 

10.57. Two experiments gave the following results: 


n 

3c 

y 

■s. 

s y 

r 

100 

10 

20 

5 

8 

0.70 

200 

12 

22 

6 

10 

0.80 


Calculate r for the combined sample. 

10.58. Consider the following matrices: Y is n x 1, p is /> x 1, X is n x y? and 
of rank p. Let Y be N(\P, a 2 l). Discuss the joint p.d.f. of p = 

and Y [I- XiX^-'X^Y/a 2 . 

10.59. Fit jv = a + x to the data 

x 0 12 

^~~~~i ~ 3 ~~ 4 


by the method of least squares. 

10.60. Fit by the method of least squares the plane z = a + bx + cy to the 
five points (x, y, z): (- 1, -2, 5), (0, -2,4), (0, 0, 4), (1,0, 2), (2, 1,0). 

10.61. Let the 4 x 1 matrix Y be multivariate normal iV(XP, <r 2 I), where the 
4x3 design matrix equals 



and p is the 3 x 1 regression coefficient matrix. 

(a) Find the mean matrix and the covariance matrix of p = (X'X) _I X’Y. 

(b) If we observe Y' to be equal to (6, 1, 11, 3), compute p. 

10.62. Let the independent normal random variables Y^, Y 2 ,... , Y n have, 
respectively，the probability density functions N(fi, y 2 xj), i = 1,2,... ,n, 
where the given a:,, x 2 ,.. ., are not all equal and no one of which is zero. 

Discuss the test of the hypothesis H 0 :y = l, n unspecified, against all 
alternatives H { \y ^ 1, fi unspecified. 
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10.63. Lei Y t , Y 2 ,..., Y n bc n independent normal variables with common 
unknown variance <r 2 . Let Y t have mean px h i = 1, 2,..., n, where 
x 、， Xi, … ， x„ are known but not all the same and p is an unknown 
constant. Find the likelihood ratio test for H 0 : p = 0 against all 
alternatives. Show that this likelihood ratio test can be based on a statistic 
that has a well-known distribution. 


10.64. Consider the multivariate normal p.d.f. /(x; ji, E) where the known 
parameters equal either |i ( , or n 2 , £ 2 t respectively. 

(a) IfL, = L 2 is known to equal L, classify X as being in the second of these 
distributions if 


^x; n,,E) 


<k; 


/(x; ji 2 , E) 

otherwise, X is classified as being from the first distribution. Show that 
this rule is based upon a linear function of X and determine its 
distribution. This allows us to compute the probabilities of 
misclassification. 

(b) IfE, and L 2 are different but known, show that 


/(x ； Hi,L!) 

/(x; |i 2 , E 2 ) — 


can be based upon a second degree polynomial in X. When either £, 
or L 2 is the correct covariance matrix, does this expression have a 
chi-square distribution? 
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N onp arametric 
Methods 


11.1 Confidence Intervals for Distribution Quantiles 

We shall first define the concept of a quantile of a distribution of 
a random variable of the continuous type. Let Ibe a random variable 
of the continuous type with p.d.f, f(x) and distribution function F(x). 
Let p denote a positive proper fraction and assume that the equation 
F(jc) = p has a unique solution for x. This unique root is denoted by 
the symbol ^ and is called the quantile (of the distribution) of order 
p. Thus Pr {X < ^ p ) = F\^ p ) = p. For example, the quantile of order ^ 
is the median of the distribution and Pr (X < ^ 0 3 ) = 

In Chapter 6 we computed the probability that a certain random 
interval includes a special point. Frequently, this special point was a 
parameter of the distribution of probability under consideration. 
Thus we are led to the notion of an interval estimate of a parameter. 
If the parameter happens to be a quantile of the distribution, and if we 
work with certain functions of the order statistics, it will be seen that 
this method of statistical inference is applicable to all distri- 
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butions of the continuous type. We call these methods distribution-free 
or nonparametric methods of inference. 

To obtain a distribution-free confidence interval forthe quantile 
of order p, of a distribution of the continuous type with distribution 
function F(x), take a random sample X 2 ,..., X„ of size w from that 
distribution. Let K, < K 2 < … < F" be the order statistics of the 
sample. Take < Yj and consider the event Y,- < ^ p < Yj. For the /th 
order statistic Y t to be less than it must be true that at least i of the 
X values are less than Moreover，for the yth order statistic to be 
greater than fewer than j of the X values are less than ^ p . That is, 
if we say that we have a “success” when an individual X value is less 
than ^ p , then, in the n independent trials, there must be at least i 
successes but fewer than j successes for the event < Yj to 

occur. But since the probability of success on each trial is 
Pr (X < Ip) = F{l p ) = p, the probability of this event is 

p r (w r ;) = I, — 尸) …， 

the probability of having at least i, but less than /', successes. When 
particular values of n, i, and j are specified, this probability can be 
computed. By this procedure, suppose it has been found that 
y = Pr (Yi < < Yj). Then the probability is y that the random 

interval (Y h Y/) includes the quantile of order p. If the experimental 
values of Y { and Y } are, respectively, and the interval 乃 ） serves 
as a lOOy percent confidence interval for t，the quantile of order p. 

An illustrative example follows. 


Example l Let r, < r 2 < r 3 < y 4 be the order statistics of a random 
sample of size 4 from a distribution of the continuous type. The probability 
that the random interval (F,, K 4 ) includes the median s of the distribution 
will be computed. We have 


< Us < y *)= 



= 0.875. 


If K, and y^are observed to be_y, = 2.8 and y 4 = 4.2, respectively, the interval 
(2.8, 4.2) is an 87.5 percent confidence interval for the median ^ 0 5 of the 
distribution. 


For samples of fairly large size, we can approximate the binomial 
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probabilities with those associated with normal distributions, as 
illustrated in the next example. 

Example 2. Let the following numbers represent the values of the order 
statistics of n = 27 observations obtained in a random sample from a certain 
distribution of the continuous type: 

61, 69, 71, 74, 79, 80, 83, 84, 86, 87, 92, 93, 96, 100, 

104, 105, 113, 121, 122, 129, 141, 143, 156, 164, 191, 217, 276. 

Say that we are interested in estimating the 25th percentile < 0 . 2S (that is, the 
quantile of order 0.25) of the distribution. Since (n + \)p = 28(|) = 7, the 
seventh order statistic, y-j = 83, could serve as a point estimate of ^ 0 25 . To get 
a confidence interval for f 0 . 25 , consider two order statistics, one less than y-, 
and the other greater, for illustration, y 4 and 少 l0 . What is the confidence 
coefficient associated with the interval (y 4 , j, 0 )? Of course, before the sample 
is drawn, we know that 

7 = Pr(K 4 <^ 25 < K 10 )= X ( 2 J) (0.25 广 (0.75 尸 -' 


That is, 


y = Pr (3.5 < W < 9.5), 


where W is b{21, with mean ^ = 6.75 and variance 钱 . Hence y is 
approximately equal to 



Thus (= 74, jio = 87) serves as an approximate 81.4 percent confidence 
interval for ^ 0 . 2 i- It should be noted that we could choose other intervals also, 
for illustration, (>» 3 = 71, y u = 92), and these would have different confidence 
coefficients. The persons involved in the study must select the desired 
confidence coefficient, and then the appropriate order statistics, K, and Yj, 
are taken in such a way that i and j are fairly symmetrically located about 
(n + l)/>. 


EXERCISES 

11.1. Let Y„ denote the nth order statistic of a random sample of size n from 
a distribution of the continuous type. Find the smallest value of n for which 
Pr (49 < n)>0.75. 

11.2. Let Y t < Y 2 < Y y < Y 4 < Y 5 denote the order statistics of a random 
sample of size 5 from a distribution of the continuous type. Compute: 
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⑻ Pr(y, <^ 05 <y 5 ). 

(b) Pr(y,<^. 25 < y 3 ). 

(c) Pr(r 4 <‘ 0 < r 5 ). 

11.3. Compute Pr(y 3 < <^ 0 5 < y 7 ) if K, < ■•- < Y 9 are the order statistics of 
a random sample of size 9 from a distribution of the continuous type. 

11.4. Find the smallest value of n for which Pr (X, < ^ 0 5 < Y„) > 0.99, where 
Y y < ' •' < Y n are the order statistics of a random sample of size n from a 
distribution of the continuous type. 

11.5. Let Y\ < Y 2 denote the order statistics of a random sample of size 2 from 
a distribution which is N(ji, a 2 ), where a 2 is known. 

(a) Show that Pr(y, < /x < Y 2 ) = 5 and compute the expected value of the 
random length Y 2 — Y\. 

(b) If X_ is the mean of this sample, find the constant c such that 
Vr {X — ca < n < X + ca) = V and compare the length of this random 
interval with the expected value of that of part (a). 

11.6. Let Y { < y 2 < • • . < y 25 be the order statistics of a random sample of 
size n = 25 from a distribution of the continuous type. Compute 
approximately: 

(a) Pr(y g <^ 05 < y lg ). 

(b) Pr(y 2 <^. 2 < r,). 

(c) Pr (y"i 8 < (jo . 8 < y 23 ). 

11.7. Let Y t < Y 2 < ■ • ■ < y 100 be the order statistics of a random sample of 

size n = 100 from a distribution of the continuous type. Find i < j so that 
Pr ( Yi < < Yj) is about equal to 0.95. 

11.8. Let 《 1/4 be the 25th percentile of a distribution of the continuous type. 
Let Y y < Y 2 < • • • < y 48 be the order statistics of a random sample of size 
« = 48 from this distribution. 

(a) In terms of “binomial probabilities,” to what is Pr(y 9 <《 l/4 < y 16 ) 
equal? 

(b) How would you approximate this answer with “normal probabilities ”？ 

(c) Find i such that Pr ( Y n - , < <Ji /4 < Y i3 + ,) is as close as possible to 0.95 
(using the normal approximation). 

11.2 Tolerance Limits for Distributions 

We propose now to investigate a problem that has something of 
the same flavor as that treated in Section 11.1. Specifically, can we 
compute the probability that a certain random interval includes (or 
covers) a preassigned percentage of the probability for the distri- 
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bution under consideration? And, by appropriate selection of the 
random interval, can we be led to an additional distribution-free 
method of statistical inference? 

Let I be a random variable with distribution function F(jc) of the 
continuous type. The random variable Z = F{X) is an important 
random variable, and its distribution is given in Example 1, Section 4.1. 
It is our purpose now to make an interpretation. Since Z = F{X) has 
the p.d.f. 

h(z) = 1, 0 < z < 1, 

= 0 elsewhere, 

then, if 0 < /) < 1, we have 

Pr [F\X) <p]= dz — p. 

Now = Pr (A" < x). Since Pr (A" = x) = 0, then FW is the 
fractional part of the probability for the distribution of X that is 
between — oo and x. If < p, then no more than \00p percent of 
the probability for the distribution of X is between — oo and x. But 
recall Pr <,p\= p. That is, the probability that the random 
variable Z = F\X) is less than or equal to p is precisely the probability 
that the random interval ( — oo, X) contains no more than 100/? percent 
of the probability for the distribution. For example, the probability 
that the random interval (—cx), X) contains no more than 70 percent 
of the probability for the distribution is 0.70; and the probability that 
the random interval ( —oo, X) contains more than 70 percent of the 
probability for the distribution is 1 — 0.70 = 0.30. 

We now consider certain functions of the order statistics. Let 
X t , X 2 ,... t X n denote a random sample of size n from a distribution 
that has a positive and continuous p.d.f. /(x) if and only ifa<x<b-, 
and let i*T(x) denote the associated distribution function. Consider 
the random variables F{X 2 ), ... ， FiX n ). These random 

variables are independent and each, in accordance with Example 1, 
Section 4.1, has a uniform distribution on the interval (0, 1). Thus 
FIXy), F[X 2 ),. .., F\X„) is a random sample of size n from a uniform 
distribution on the interval (0, 1). Consider the order statistics of this 
random sample F^Xt ), F{X 2 ), •.. ， F(X„). Let Z, be the smallest of these 
FlXi), Z 2 the next F(Xi) in order of magnitude, ... ， and Z„ the largest 
F(Xi). If y,, y 2 ,..., y„ are the order statistics of the initial random 
sample X U X 2 ,..., X„, the fact that is a nondecreasing 
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(here, strictly increasing) function of x implies that Z, = 

Z 2 = F( Y 2 ), …， Z n = F{Y n ). Thus the joint p.d.f. of Z,, Z 2 , ..., Z„is 
given by 

h{z^z 2 , ..., z„) = n\, 0 < z, < z 2 < • • • < z„ < 1, 

= 0 elsewhere. 

This proves a special case of the following theorem. 

Theorem 1. Let Y u Y 2y ..., Y„ denote the order statistics of a 
random sample of size n from a distribution of the continuous type that 
has p.d.f. f{x) and distribution function F(x). The joint p.d.f. of the 
random variables Z t = /(y,), i~ 1, 2,.. ., is, 


h(z {i z 2 , . .. ,z„) = n\, 0 < < z 2 < ■ ■■ < z„ < \ , 


= 0 elsewhere. 

Because the distribution function of Z = F{X) is given by z, 
0 < z < 1 ， the marginal p.d.f. of Z k = F{ Y k ) is the following beta p.d.f.: 


h k (z k ) 


n\ 


{k - 1)! (n - k)\ 


4-'d-^r 


0 


elsewhere. 


Moreover, the joint p.d.f. of Z, = F( Y；) and Z y 
given by 

«! z\- \zj - z,y-'-'(l - Zj) n ~ J 


hij{Zi, Zj) 


(i — 1)! {j — i — l) 1 - {n — j)\ 
= 0 elsewhere. 


0 <z* < 1, 

( 1 ) 

F{Yj) is, with i <j. 


0 < Zj< Zj < 1 , 


( 2 ) 


Consider the difference Z y — Z, = F(Yj) — FlYj), i<j. Now 
Flyj) = Pr (A" < yj) and 八兄） =Pr (A" < ^). Since Pr (X = yi) = 
Pr (X = yj) = 0, then the difference F(y/) — F(y,) is that fractional part 
of the probability for the distribution of X that is between y, and yj. 
Let p denote a positive proper fraction. If F[yj) — J^yi) > p, then at 
least 100/> percent of the probability for the distribution of X is between 
y ； and y f . Let it be given that y = Pr [F(y y ) — Fd) ^ Then the 
random interval (Y f , Yj) has probability y of containing at least 100/? 
percent of the probability for the distribution of X. If now 少 ,. and 
denote, respectively, experimental values of Y t and Y jt the interval 
(y h yj) either does or does not contain at least 100 /? percent 
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of the probability for the distribution of X. However, we refer to the 
interval ( 乃，乃 ） as a lOOy percent tolerance interval for lOOp percent of 
the probability for the distribution of X. In like vein ， 少 ,• and^ are called 
lOOy percent tolerance limits for 100/> percent of the probability for the 
distribution of X. 

One way to compute the probability y : = p r [F(y 7 )-i^,) 2 p]is 
to use Equation (2), which gives the joint p.d.f. of Z, = /^F,) and 
Zj = F(Yj). The required probability is then given by 

ft — p 广 I 

y = Pr (Zj - Zi >p)= hijiz^z^dzjdzi ， 

*0 + Zj 

Sometimes, this is a rather tedious computation. For this reason and 
for the reason that coverages are important in distribution-free 
statistical inference, we choose to introduce at this time the concept of 
a coverage. 

Consider the random variables = f(Yi) = Z u fV 2 = F(V 2 ) — 
F(r t ) = Z 2 -Z„^ 3 = FlY,)- HY 2 ) = Z,-Z 2 ,...,W n = F(Y n )~ 
= Z n — Z„_,. The random variable W y is called a coverage of 
the random interval {jc t — c» < x < y,} and the random variable W h 
i — 2,3, ... ,n, is called a coverage of the random interval 
{jc : y,_ 1 < jc < Yi}. We shall find the joint p.d.f. of the n coverages 
fV t , fV 2 , ..., W n . First we note that the inverse functions of the 
associated transformation are given by 

z, = w ,， 

Z 2 = W, + w 2 , 

Z 3 = W, + w 2 + w 3 . 


z n = Wi + w 2 -\- w 3 -\ - 1 - w„. 

We also note that the Jacobian is equal to 1 and that the space of 
positive probability density is 

{(w, ， vv 2 , … ， w„): 0 < w,，/ = 1 ， 2, ... ，《， w, + … + w" < 1}. 

Since the joint p.d.f. of Z,, Z 2 ,..., Z„isn!,0 < z } < z 2 < . •. <z n < \, 
zero elsewhere, the joint p.d.f. of the n coverages is 

k{yv u ... ， vv„) = n !， 0 < w„ i. = 1 ， ... ， n ， w, + • • • + vv„ < 1, 


= 0 elsewhere. 
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A reexamination of Example 1 of Section 4.5 reveals that this is a 
Dirichlet p.d.f. with k = n and a, = a 2 = ■ • ■ = a n + , = 1. 

Because the p.d.f. k(w u ..., vv„) is symmetric in w l5 vv 2 ,..., w„, it 
is evident that the distribution of every sum of r, r < «, of these 
coverages W x ,..., W„ is exactly the same for each fixed value of r. 


For instance, if i < j and r =j — i, the distribution of Z y — Z,= 
F{Yj) — F(Yj) = W l+ , + fV (+2 -i- ■ • ■ + Wj h exactly the same as that 
of Zj_i = F[Yj_i) = W x + + ••• + Wj_i. But we know that the 

p.d.f. of Zj_i is the beta p.d.f. of the form 




r(» + 1 ) 


ro-or(«-y+/ + i) 

0 elsewhere. 


'(1 - v)"~ J+i , 0 <v < \, 


Consequently, F(Yj) — f{Y；) has this p.d.f. and 

啤 1 

Pr[F(Y J )-F(Y i )>p]= hj — ^dv. 

Example 1. Let Y { < Y 2 < ■•- < Y 6 be the order statistics of a random 
sample of size 6 from a distribution of the continuous type. We want to use 
the observed interval as a tolerance interval for 80 percent of the 

distribution. Then 


y = Pr[F(Y (> )~F{Y i )>0.&] 
/ » 0.8 


30u 4 (l — v) dv. 


because the integrand is the p.d.f. of — F{Y X ). Accordingly, 

y=\ - 6(0.8 ) 5 + 5(0. 8) 6 = 0.34, 

approximately. That is, the observed values of Y { and Y b will define a 34 
percent tolerance interval for 80 percent of the probability for the distribution. 

Example 2. Each of the coverages W u i = 1,2,... ,n, has the beta p.d.f. 

kx(w) = «(1 — iv) n_ 0 < w < I, 

= 0 elsewhere, 

because W^, = Z, = has this p.d.f. Accordingly, the mathematical 
expectation of each W, is 

f i 

nw(\ — wY~ l dw = - - . 

n 4 - 1 
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Now the coverage IV, can be thought of as the area under the graph of the 
p.d.f. f(x), above the x-axis, and between the lines x = Y, _ t and x = Y,. (We 
take Y 0 = — oo.) Thus the expected value of each of these random areas W h 
i = 1, 2,..., is l/(« + 1). That is, the order statistics partition the 
probability for the distribution into n + 1 parts, and the expected value of each 
of these parts is 1 /(«+.' 1). More generally, the expected value of F{ Yj) — f{ Y,), 
i < j\ is (J — i)/(n + 1), since F{Yj) — f{Yj) is the sum of j — i of these 
coverages. This result provides a reason for calling Y k , where (n + \)p = k, 
the (100p)th percentile of the sample, since 


mY k )]= 


k 

n+ 1 


(n + l)/> 

n+ 1 


= />• 


EXERCISES 

11.9. Let Y x and Y„ be, respectively, the first and nth order statistics of a 
random sample of size n from a distribution of the continuous type having 
distribution function F\x). Find the smallest value of n such that 
Pr - F(r,) ^ 0.5] is at least 0.95. 

11.10. Let Y 2 and denote the second and the (« — l)st order statistics 

of a random sample of size n from a distribution of the continuous type 
having distribution function Compute Pr [/*![F,_ ]) — Y 2 ) >p], 

where 0 </) < 1. 

11.11. Let K, < y 2 < • * * < ^48 be the order statistics of a random sample 
of size 48 from a distribution of the continuous type. We want to use the 
observed interval ( 少 4 , _y 45 ) as a lOOy percent tolerance interval for 75 percent 
of the distribution. 

(a) To what is y equal? 

(b) Approximate the integral in part (a) by noting that it can be written as 
a partial sum of a binomial p.d.f., which in turn can be approximated 
by probabilities associated with a normal distribution. 

11.12. Let y, < Y 2 < • • • < be the order statistics of a random sample of 
size n from a distribution of the continuous type having distribution 
function F{x). 

⑻ What is the distribution of (7 = 1 — F(Yj)l 
(b) Determine the distribution of V = F{Y„) — f{Yj) + F^Yj) — 
where i < j. 

11.13. Let K < y 2 < ... < R。be the order statistics of a random sample 
from a continuous-type distribution with distribution function What 
is the joint distribution of V, = F{Y 4 ) — F{Y 2 ) and V 2 = /T^io) — HYeV 



11.3 The Sign Test 


Some of the chi-square tests of Section 6.6 are illustrative of the 
type of tests that we investigate in the remainder of this chapter. Recall, 
in that section, we tested the hypothesis that the distribution of a 
certain random variable Jfisa specified distribution. We did this in the 
following manner. The space of X was partitioned into k mutually 
disjoint sets A t , A 2 ,..., A k . The probability p i0 that Xe A ； was 
computed under the assumption that the specified distribution is the 
correct distribution, i = \,2, ... ,k. The original hypothesis was then 
replaced by the less restrictive hypothesis 

Hq \ Pr g A/') — Pio, i = l, 2,,..., kj 

and a chi-square test, based upon a statistic that was denoted by Q* _,, 
was used to test the hypothesis H 0 against all alternative hypotheses. 

There is a certain subjective element in the use of this test, namely 
the choice of k and of A t ,A 2 ,..., A k . But it is important to note that 
the limiting distribution of under H 0 , is - 1); that is, the 
distribution of Q* _, is free of p i0 , p 汾 , and, accordingly, of the 
specified distribution of X. Here, and elsewhere, “under Hq* means 
when H 0 is true. A test of a hypothesis H 0 based upon a statistic whose 
distribution, under H 0 , does not depend upon the specified distribution 
or any parameters of that distribution is called a distribution-free or a 
nonparametric test. 

Next, let 巧文 ） be the unknown distribution function of the random 
variable X. Let there be given two numbers 《 and p 0 , where 0 < p 0 < 1. 
We wish to test the hypothesis H 0 : F(^) = p 0 , that is, the hypothesis 
that ^ = ^ o , the quantile of order p 0 of the distribution of X. We could 
use the statistic Q k ^ u with 众 = 2, to test H 0 against all alternatives. 
Suppose, however, that we are interested only in the alternative 
hypothesis, which is H t : > Po- One procedure is to base the test 

of H 0 against upon the random variable Y, which is the number of 
observations less than or equal to f in a random sample of size n from 
the distribution. The statistic Y can be thought of as the number of 
“successes” throughout n independent trials. Then, if H 0 is true, Y 
is b[n, po = /^^)]; whereas if H 0 is false, Y is b[n, p = / whatever be 
the distribution function 尺文 ). We reject H 0 and acdbpt //, if and 
only if the observed value y>c, where c is an integer selected 
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such that Pr (X > c; H 0 ) is some reasonable significance level a.The 
power function of the test is given by 

^(P)= i -PT~ y ^ Po<P< 1, 

where p = F{^). In certain instances, we may wish to approximate 
K{p) by using an approximation to the binomial distribution. 

Suppose that the alternative hypothesis to H 0 : = p 0 is 

/f, : F{^) < p 0 . Then the critical region is a set {^ : ^ < c,}. Finally, if 
the alternative hypothesis is H x : F{^) # p 0 , the critical region is a set 
{^ ： ^ < c 2 or c 3 < y}. 

Frequently,^ = \ and, in that case, the hypothesises that the given 
number ^ is a median of the distribution. In the following example, 
this value of p 0 is used. 

Example 1. Let X 2 , ■.. ， be a random sample of size 10 from a 
distribution with distribution function F{x). We wish to test the hypothesis 
H 0 : F\J2) = \ against the alternative hypothesis Hi : F\12) > 5. Let Y be the 
number of sample items that are less than or equal to 72. Let the observed 
value of r be 7 , and let the test be defined by the critical region ^ 8 }. 
The power function of the test is given by 

Kip) = J g O 1 ”， I 9 < 1, 

where p = F(J2). In particular, the significance level is 



In many places in the literature, the test that we have just described 
is called the sign test. The reason for this terminology is that the test 
is based upon a statistic Y that is equal to the number of nonpositive 
signs in the sequence X 2 — X„ — In the next section 

a distribution-free test, which considers both the sign and the 
magnitude of each deviation X t — is studied. 

EXERCISES 

11.14. Suggest a chi-square test of the hypothesis which states that a 
distribution is one of the beta type, with parameters a = 2 and p = 2. 
Further, suppose that the test is to be based upon a random sample of size 
100. In the solution, give k, define A,, A 2 , ■ ■ ■, A k , and compute each 
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p n . If possible, compare your proposal with those of other students. Are 
any of them the same? 

11.15. Let X U X 2 ,..., be a random sample of size 48 from a distribution 

that has the distribution function F(x). To test H 0 : = \ against 

H t : /1[41) < I，use the statistic Y, which is the number of sample 
observations less than or equal to 41. If the observed value of y is ^ 7, 
reject H Q and accept H t . If p = ^41), find the power function K{p), 
0 < /j ^ 5 , of the test. Approximate a = 尺 ( 去 ). 

11.16. Let X\, X 2 ,. .., Jfioo be a random sample of size 100 from a distri¬ 
bution that has distribution function To test H 0 : 尺 90) — ^1(60) = 3 
against i/, : 尺 90) — i^60) > 5 , use the statistic Y, which is the number of 
sample observations less than or equal to 90 but greater than 60. If the 
observed value of Y, say y, is such that y>c, reject H 0 . Find c so that 
a = 0.05, approximately. 

11.17. Let X u X 2 ,..., X„be a random sample from some continuous-type 
distribution. We wish to consider only unbiased estimators of Pr (X <, c), 
where c is a fixed constant. 

(a) What would you use as an unbiased estimator if you had no additional 
assumptions about the distribution? 

(b) What would you use as an unbiased estimator if you knew the 
distribution was normal with unknown mean fi and variance a 2 = 1 ? 

11.4 A Test of Wilcoxon 

Suppose that X u X 2 ,..., X n is a random sample from a 
distribution with distribution function F(x). We have considered a test 
of the hypothesis 呢 ） = given, which is based upon the signs of 
the deviations X 2 — , X n — In this section a statistic is 

studied that takes into account not only these signs, but also the 
magnitudes of the deviations. 

To find such a statistic that is distribution-free, we must make two 
additional assumptions: 

1. F{x) is the distribution function of a continuous type of random 
variable X. 

2. The p.d.f. f(x) of X has a graph that is symmetric about the vertical 
axis through ^ 0-5 , the median (which we assume to be unique) of the 
distribution. 

Thus 

-x)=\ - F\Us + X) 
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and 

/(^ 0.5 ~ X ) = /(^0 5 + X )y 

for all x. Moreover, the probability that any two observations of a 
random sample are equal is zero, and in our discussion we shall assume 
that no two are equal. 

The problem is to test the hypothesis that the median ^ 0 . 5 of the 
distribution is equal to a fixed number, say Thus we may, in all cases 
and without loss of generality, take f = 0. The reason for this is that 
if 之 # 0 , then the fixed ^ can be subtracted from each sample 
observation and the resulting variables can be used to test the 
hypothesis that their underlying distribution is symmetric about zero. 
Hence our conditions on and f(x) become F(~x) = 1 — and 

A-x) = ^f{x), respectively. 

To test the hypothesis H 0 : i^O) = we proceed by first ranking 
X u X 2 ,..., X„ according to magnitude, disregarding their algebraic 
signs. Let R, be the rank of \X\ among |^,|, \X 2 l ..., \X„\, 
i'=l, 2,… ， n. For example, if n = 3 and if we have \X 2 \ < |^ 3 | < 
then /?, = 3, i? 2 = 1, and R 3 = 2. Thus H … ， 及 " is an arrange¬ 
ment of the first n positive integers 1,2,...,Further, let Z h 
i = 1 ，2 ,… ， n，be defined by 

i«< 0 ， 

=1 ， if X ； > 0. 

If we recall that Pr (X f = 0) = 0, we see that it does not change the 
probabilities whether we associate Z, = 1 or Z, = — 1 with the outcome 

Xi = 0. 

n 

The statistic W = Z,/?, is the Wileoxon statistic. Note that in 

I ^ 1 

computing this statistic we simply associate the sign of each X, with the 
rank of its absolute value and sum the resulting n products. 

If the alternative to the hypothesis H 0 : f 0 . 5 = 0 is //,: <^ 05 > 0, we 
reject H 0 if the observed value of H^is an element of the set {w : w > c}. 
This is due to the fact that large positive values of W indicate that 
most of the large deviations from zero are positive. For alternatives 
^ 0 5 < 0 and ^ 05 # 0 the critical regions are, respectively, the sets 
{w : w < c,} and {w : w ^ c 2 or w > c 3 }. To compute probabilities like 
Pr > c; H 0 ), we need to determine the distribution of W, under/f 0 - 
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To help us find the distribution of W, when H 0 : F(0) = ^is true, we 
note the following facts: 

1. The assumption that f(x) =f{ — x) ensures that Pr (A", < 0)= 
Pr (Xj > 0) = I，/ = l, 2,..., n. 

2. Now Z, = -1 ifXi < 0 and Z, = lif ^>0,/= 1,2,.. .,n. Hence 
we have Pr (Z,. = — 1) = Pr (Z t =1) = 1, / = 1, 2,..., More¬ 
over, Z,, Z 2 , ..., Z n are independent because X u X 2 ,..., X„ are 
independent. 

3. The assumption that/i[jc) =/(_jc) also assures that the rank /?, of 
JA"/! does not depend upon the sign Z, of X h More generally, 
R ]t R 2 , ..., R„ are independent of Z,, Z 2 , …， Z n . 

4. A sum Wis made up of the numbers 1 ， 2, ... ， n, each number with 
either a positive or a negative sign. 

n 

The preceding observations enable us to say that 

n ' 

has the same distribution as the random variable K = X F,-，where 

I 

V t , V 2 ,..., V„ are independent and 


Pr(K / = 0 = Pr(K, = = 

i = 1,2,... ,n. That K,, K 2 ,..., K n are independent follows from the 
fact that Z,, Z 2 ,..., Z„ have that property; that is, the numbers 
1,2,... ,n always appear in a sum W and those numbers receive 
their algebraic signs by independent assignment. Thus each of 
F|, K 2 ,.. ., K„ is like one and only one of Z^R y ,Z 2 R 2 , ... ， Z n R n . 

Since W and V have the same distribution, the m.g.f. of W is that 
of V, 


'am 

We can express M(t) as the sum of terms of the form (a j /2 n )e b j , . When 
M(t) is written in this manner, we can determine by inspection the 
p.d.f. of the discrete-type random variable W. For example, the 
smallest value of W is found from the term (\/2 n )e~'e~ 2 ' - - - e~ m — 
(\/2 n )e~ n(n + l)，12 and it is —n(n + 1)/2. The probability of this value 
of Wis the coefficient 1/2”. To make these statements more concrete, 
take n = 3. Then 


M(r)= 五 「exp X 〆)"]=n W Vl ) 

_ \ I /」 (=I 
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M(t) = 



宇 ) P 


_+ 

T 



= ① (r 6 , + 〆 + e -2 , + 2 +e 21 + e 4 ' + e 6 '). 
Thus the p.d.f. of W, for n = 3, is given by 

g(w) = I ， w = -6, -4, -2, 2, 4, 6, 

=|， = 0, 

= 0 elsewhere. 


The mean and the variance of W are more easily computed 

n 

directly than by working with the m.g.f. M(t). Because F = J ^ 

n l 

and W=Y, Zh have the same distribution, they have the same mean 

and the same variance. When the hypothesis H 0 : 尺 0) = 士 is true, it is 
easy to determine the values of these two characteristics of the 
distribution of W. Since E(V,) = 0, i = 1,2,we have 

= £(fV) = I E(K) = 0. 

I 

The variance of V, is ( —O 2 ^) + (O 2 ® = * 2 - Thus the variance of W is 

a 2 w = f t i 2i = n{n+l){2n+l) . 


For large values of n, the determination of the exact distribution 
of W becomes tedious. Accordingly, one looks for an approximating 
distribution. Although W is distributed as the sum of n random 
variables that are independent, our form of the central limit theorem 
cannot be applied because then random variables do not have identical 
distributions. However, a more general theorem, due to Liapounov, 
states that if U t has mean /i, and variance of, i = 1,2,n, if 
C/i, C/ 2 ,..., C/„ are independent, if E(\Ui — is finite for every /, 
and if 


Z 聊厂灼 P ) 
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then 

i t/, - zm,- 

I =* I i = I 



has a limiting distribution that is N(Q, V). For our variables 
V } , V 2 ,..., V„ we have 

and it is known that 

4 


Now 


lim 


. n\n + l) 2 /4 
[n(« + 1)(2« + l)/6] 3/2 



because the numerator is of order n A and the denominator is of order 
n 912 . Thus 


_ W_ _ 

+ 1 )(2/i 4 - 1)/6 

is approximately N(Q, 1) when // 0 is true. This allows us to approximate 
probabilities like Pr(W ^： c; H 0 ) when the sample size n is large. 

Example 1. Let ^ o s be the median of a symmetric distribution that is of 
the continuous type. To test, with a = 0.01, the hypothesis H 0 : ^ 0 .5 = 75 
against H t : 5 > 75, we observed a random sample of size n = 18. Let it be 

given that the deviations of these 18 values from 75 are the following numbers: 

1.5, -0.5, 1.6,0.4, 2.3, —0.8, 3:2, 0.9, 2.9, 


0.3, 1.8, -0.1, 1.2, 2.5, 0.6, -0.7, 1.9,1.3. 


The experimental value of the Wilcoxon statistic is equal to 


w = 11-4+12 + 3 + 15-7+18 + 8 + 17 + 2 + 13-1 
+ 9+ 16 + 5 _ 6 + 14+ 10 = 135. 

Since, with n = 18 so that y/n(n + 1)(2;+ 1)/6 = 45.92, we have that 

0.01 = Pr 2 - 326 ) = ^r{W> 106.8). 


Because w = 135 > 106.8, we reject H 0 at the approximate 0.01 significance 
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level. The /7-value associated with 135/45.92 = 2.94 is about 1 — 0.998 = 
0.002 since 0(2.94) = 0.998. 

There are many modifications and generalizations of the Wilcoxon 
statistic. One generalization is the following: Let c, < c 2 ^ < c n be 

nonnegative numbers. Then, in the Wilcoxon statistic, replace the 
ranks 1, 2,..., n by C|, c 2 ,..., c„, respectively. For example, if n = 3 
and if we have \X 2 \ < |^ 3 | < |^,|, then = 3 is replaced by c 3 , R 2 = 1 
by c,, and = 2 by c 2 . In this example, the generalized statistic is 
given by Z,c 3 + Z 2 c, + Z 3 c 2 . Similar to the Wilcoxon statistic, 
this generalized statistic is distributed under H 0 , as the sum of n 
independent random variables, the ith of which takes each of the values 
c, ^ 0 and — c,‘ with probability if c ； = 0, that variable takes the 
value c, = 0 with probability 1. Some special cases of this statistic 
are proposed in the Exercises. 


EXERCISES 

11.18. The observed values of a random sample of size 10 from a distribution 

that is sytnmetric about «f 0 5 are 10.2, 14.1, 9.2,11.3, 7.2, 9.8, 6.5, 11.8, 8.7, 
10.8. Use Wilcoxon’s statistic to test the hypothesis H 0 : 5 = 8 against 

H { : (Jo.j >8 if a = 0.05. Even though n is small, use the normal 
approximation and find the />-value. 

11.19. Find the distribution of ^ for « = 4 and n = 5. 

Hint: Multiply the moment-generating function of W, with « = 3, by 
(e~ 4 ' + e*')/2 to get that of IV, with n = 4. 

11.20. Let X t , X 2 ,..., X„bc independent. If the p.d.f. of X t is uniform over 
the interval ( —2 1- *, 2 1- *), i = 1, 2, 3,, show that Liapounov's 

n 

condition is not satisfied. The sum I does not have an approximate 

/ ™ I 

normal distribution because the first random variables in the sum tend to 
dominate it. 

11.21. If/i = 4 and, in the notation of the text, c, = 1, c 2 = 2, c 3 = c 4 = 3, find 
the distribution of the generalization of the Wilcoxon statistic, say W g . For 
a general n, find the mean and the variance of W s if c, = /, i < nj2, and 
c, = [zi/2] + 1 ， i > «/2, where [z] is the greatest integer function. Does 
Liapounov’s condition hold here? 

11.22. A modification of Wilcoxon’s statistic that is frequently used is 
achieved by replacing by — 1; that is, use the modification 
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= [ Z,•(/?,— 1). Show that W m /y/(n — \)n(2n — 1)/6 has a limiting 

i * ' 

.distribution that is N(0, 1). 

11.23. If, in the discussion of the generalization of the Wilcoxon statistic, we 
let C| = c 2 = • ， ■ = c„ = 1 ， show that we obtain a statistic equivalent to that 
used in the sign test. 

11.24. If c,, c 2 ,. .., c„ are selected so that i/(n + 1) = ^/ifn e~ xl,t dx, 
i = 1, 2,..., n, the generalized Wilcoxon W g is an example of a normal 
scores statistic. If n = 9, compute the mean and the variance of this W r 

11.25. If c, = 2 ( , i = 1,2,. . ., /i, the corresponding W g is called the binary 
statistic. Find the mean and the variance of this W g . Is Liapounov’s 
condition satisfied? 

11.26. In the definition of Wilcoxon’s statistic, let be the sum of the ranks 
of those observations of the sample that are positive and let W 2 be the sum 
of the ranks of those observations that are negative. Then W = W\ — W 2 . 

(a) Show that fV=2fV,-n(n+ 1)/2 and W=n(n+ 1)/2 -2fV 2 . 

(b) Compute the mean and the variance of each of W x and W^ 2 - 

11.27. Let X^,X 2 ,, Xy, be a random sample of size 2n from a 
continuous-type distribution that is symmetric about zero. Modify the 
Wilcoxon statistic by replacing the scores (ranks) 1, 2,..., 2«by the scores 
consisting of n ones and n twos. Call this statistic W. 

(a) Find'the variance of W. 

(b) Argue that E{e> w ) = ^±^- 

(c) Evaluate lim What is the limiting distribution of Wl^fkl 

11.5 The Equality of Two Distributions 

In Sections 11.3 and 11.4, some tests of hypotheses about one 
distribution were investigated. In this section, as in the next section, 
various tests of the equality of two distributions are studied. By the 
equality of two distributions, we mean that the two distribution 
functions, say F and G, have F\z) = G(z) for all values of z. 

The first test that we discuss is a natural extension of the chi- 
square test. Let X and Y be independent variables with dis¬ 
tribution functions F(x) and G(y), respectively. We wish to test the 
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hypothesis that F\z) = G(z), for all z. Let us partition the real line into 
k mutually disjoint sets A x , A 2 ,..., A k . Define 

= Pr (JT e O ， i = 1 ， 2, ■. _ ，众， 

and 

p a = Pr (Ye Ai), i=l,2,...,k. 

If F\z) = G(z), for all z, then p iX = p i2 , i = 1 ， 2, ■. • ， A：. Accordingly, the 
hypothesis that F{z) = G(z), for all z, is replaced by the less restrictive 
hypothesis 


'• Pn = Put i — 2,. .., k. 


But this is exactly the problem of testing the equality of two 
multinomial distributions that was considered in Example 3, Section 
6.6, and the reader is referred to that example for the details. 

Some statisticians prefer a procedure which eliminates some of the 
subjectivity of selecting the partitions. For a fixed positive integer k, 
proceed as follows. Consider a random sample of size m from the 
distribution of X and an independent random sample of size n from 
the distribution of Y. Let the experimental values be denoted by 
Xi, x 2t .. ■, x m and y\,y 2 , ■ ■ •, y„- Then combine the two samples into 
one sample of size m + n and order the m + rt values (not their absolute 
values) in ascending order of magnitude. These ordered items are then 
partitioned into k parts in such a way that each part has the same 
number of items. (If the sample sizes are such that this is impossible, 
a partition with approximately the same number of items in each group 
suffices.) In effect, then, the partition A u A 2 ,..., is determined by 
the experimental values themselves. This does not alter the fact that the 
statistic, discussed in Example 3, Section 6.6, has a limiting distribution 
that is x 2 (k — 1). Accordingly, the procedures used in that example may 
be used here. 

Among the tests of this type there is one that is frequently used. 
It is essentially a test of the equality of the medians of two distri¬ 
butions. To simplify the discussion, we assume that m + n. 
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the size of the combined sample, is an even number, say m -f n = 2A, 
where h is a. positive integer. We take k = 2 and then the combined 
sample of size m rt — 2h, which has been ordered, is separated into 
two parts, a “lower hair’ and an “upper half,” each containing 
h = {m-\- n)/2 of the experimental values of X and Y. The statistic, 
suggested by Example 3, Section 6.6, could be used because it has, when 
H 0 is true, a limiting distribution that is x 2 (l)- However, it is more 
interesting to find the exact distribution of another statistic which 
enables us to test the hypothesis H 0 against the alternative 
H x : F{z) > G(z) or against the alternative : F(z) < G(z) as opposed 
to merely F(z) # G(z). [Here, and in the sequel, alternatives 
F\z) > G(z) and F{z) < G(z) and F(z) ^ G(z) mean that strict 
inequality holds on some set of positive probability measure.] 
This other statistic is V, which is the number of observed values of X 
that are in the lower half of the combined sample. If the observed value 
of Fis quite large, one might suspect that the median of the distribution 
of A"is smaller than that of the distribution of Y. Thus the critical region 
of this test of the hypothesis H 0 : F{z) = G(z), for all z, against 
H y : F(z) > G(z) is of the form V > c. Because our combined sample 
is of even size, there is no unique median of the sample. However, one 
can arbitrarily insert a number between the hth and (h + l)st ordered 
items and call it the median of the sample. On this account, a test of 
the sort just described is called a median test. Incidentally, if the 
alternative hypothesis is : F(z) <, G(z), the critical region is of the 
form V <, ,c. 


The distribution of V is quite easy to find if the distribution 
functions and G(y) are of the continuous type and if F(z) = G(z), 
for all z. We shall now show that V has a hypergeometric p.d.f. Let 
m + n = 2h, h a. positive integer. To compute Pr (V = v), we need the 
probability that exactly vofXy, X 2 ,..., X m are in the lower half of the 
ordered combined sample. Under our assumptions, the probability is 
zero that any two of the 2h random variables are equal. The smallest 


h of the m n = 2h items can be selected in any one of 


Each of these ways has the same probability. Of these 



ways. 

ways. 


we need to count the number of those in which exactly v of the 
m values of X (and hence h — v of the n values of Y) 
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appear in the lower h items. But this is 



.Thus the p.d.f. 


of V is the hypergeometric p.d.f. 


k(v) = Pr(V= v) 




u = 0, 1，2,…， wi ， 


= 0 elsewhere, 

where m + rt = 2h. 

The reader maybe momentarily puzzled by the meaning off ^ ，l ^ 


n 


fort? = 0, 1, 2,, m. For example, letm = 17, n = 3, so that A = 10. 


Then we have 


10 — y 


,v = 0, 1 ， ... ， 17. However, we take 


n 


h 


to be zero if A — y is negative or if h — v > n. 

If m + « is an odd number, say m + n = 2/i + 1， it is left to the 
reader to show that the p.d.f. k(v) gives the probability that exactly v 
of the m values of X are among the lower h of the combined 2h + 1 
values; that is, exactly v of the m values of X are less than the median 
of the combined sample. 

If the distribution functions F{x) and G(y) are of the continuous 
type, there is another rather simple test of the hypothesis that 
F(z) = G(z), for all z. This test is based upon the notion of runs of values 
of A" and of values of Y. We shall now explain what we mean by runs. 
Let us again combine the sample of m values of X and the sample of 
n values of Y into one collection of m + n ordered items arranged in 
ascending order of magnitude. With m = l and « = 8 we might find 
that the 15 ordered items were in the arrangement 

x yyy xx y x yy xxx yy 


Note that in this ordering we have underscored the groups of succes¬ 
sive values of the random variable X and those of the random variable 
Y. If we read from left to right, we would say that we have a run of 
one value of X, followed by a run of three values of Y, followed by 
a run of two values of X, and so on. In our example, there is a 
total of eight runs. Three are runs of length 1; three are runs of 
length 2; and two are runs of length 3. Note that the total number of 
runs is always one more than the number of unlike adjacent symbols. 
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Of what can runs be suggestive? Suppose that with m — 1 and 
/I = 8 we have the following ordering: 

xxxxx y xx yyyyyyy. 

To us, this strongly suggests that F\z) > G(z). For if, in fact, 
F\z) = G(z) for all z, we would anticipate a greater number of runs. And 
if the first run of five values of X were interchanged with the last run 
of seven values of Y, this would suggest that f\z) < G(z). But runs can 
be suggestive of other things. For example, with m = l and « = 8, 
consider the runs. 

yyyy xxxxxxx yyyy. 


This suggests to us that the medians of the distributions of X and Y 
may very well be about the same, but that the “spread” (measured 
possibly by the standard deviation) of the distribution of X is 
considerably less than that of the distribution of Y. 

Let the random variable R equal the number of runs in the 
combined sample, once the combined sample has been ordered. 
Because our random variables X and Y are of the continuous type, we 
tnay assume that no two of these sample items are equal. We wish to 
find the p.d.f. of R. To find this distribution, when F(z) = G(z), we shall 
suppose that all arrangements of the m values of X and the n values 
of Y have equal probabilities. We shall show that 


Pr ( 及 = 2 灸 + 1)= 





m 


n 



⑽ = 2 ㈣㈡)㈡ )/(:) 。) 

when 2k and 及 + 1 are elements of the space of R. 

To prove formulas (1), note that we can select the m positions for 

them values of X from the m + n positions in any one of^ m ^ ways. 

Since each of these choices yields one arrangement, the probability of 

(^ J. The problem is now to 

determine how many of these arrangements yield R — r, where r 
is an integer in the space of R. First, let r = 2k -|- 1, where ^ is a 
positive integer. This means that there must be /: + 1 runs of the 
ordered values of A" and k runs of the ordered values of Y or vice versa. 
Consider first the number of ways of obtaining k + 1 runs of 
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the m values of X. We can form 众 + 1 of these runs by inserting k 
“dividers” into them — 1 spaces between the values of X, with no more 
than one divider per space. This can be done in any one of 

j ways. Similarly, we can construct k runs of the « values of Y 



by inserting k — 1 dividers into the « — 1 spaces between the values of 
Y, with no more than one divider per space. This can be done in any 
n — 


one of 


k- 1 


ways. The joint operation can be performed in any one 


of 




ways. These two sets of runs can be placed together 


to form r = 2k + \ runs. But we could also have k runs of the values 
of X and k + \ runs of the values of Y. An argument similar to the 
preceding shows that this can be affected in any one of 



ways. Thus 

('to 

Pr (R = 2k + l) = - y- 

(m + n 

\ m 




which is the first of formulas (1). 

Ifr = 2k, where A: is a positive integer, we see that the ordered values 
of X and the ordered values of Y must each be separated into k runs. 

1 、 

These operations can be performed in any one of | and 「 



ways, respectively. These two sets of runs can be placed together to 
form r = 2k runs. But we may begin with either a run of values of X 
or a run of values of Y. Accordingly, the probability of 2k runs is 


Pr (R = 2k)= 




which is the second of formulas (1). 

If the critical region of this run test of the hypothesis 
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H 0 : F(z) = G(z) for all z is of the form /? < c, it is easy to compute 
a = Pr (/? < c; H 0 )，provided that w and n are small. Although it is not 
easy to show, the distribution of R can be approximated, with large 
sample sizes m and n, by a normal distribution with mean 


and variance 


H = E(R) = 2 


mn 


m 


n 


(MjZ 一 2 ) 
m + n — l 


The run test may also be used to test for randomness. That is, it can 
be used as a check to see if it is reasonable to treat X } , Xi,..., X s as 
a random sample of size s from some continuous distribution. To 
facilitate the discussion, take s to be even. We are given the s values 
of JTto be x,, x 2 ,..., x s , which are not ordered by magnitude but by 
the order in which they were observed. However, there are s/2 of these 
values, each of which is smaller than the remaining s/2 values. Thus we 
have a “lower half 1 ’ and an “upper hair’ of these values. In the 
sequence x,, x 2 ,..., x s , replace each value X that is in the lower half 
by the letter L and each value in the upper half by the letter U. Then, 
for example, with s = 10, a sequence such as 

LLLLULUUUU 


may suggest a trend toward increasing values of X\ that is, these 
values of X may not reasonably be looked upon as being the 
observations of a random sample. If trend is the only alternative to 
randomness, we can make a test based upon R and reject the hypothesis 
of randomness R < c. To make this test, we would use the p.d.f. of 
R with m~n — s/2. On the other hand if, with s = 10, we find a 
sequence such as 

LULULULULU, 

our suspicions are aroused that there may be a nonrandom effect which 
is cyclic even though R = 10. Accordingly, to test for a trend or a cyclic 
effect, we could use a critical region of the form R < c' or R> c 2 . 

If the sample size s is odd, the number of sample items in the “upper 
hair’ and the number in the “lower half” will differ by one. Then, 
for example, we could use the p.d.f. of R with m = (s — 1)/2 and 
n = (s + 1)/2, or vice versa. 
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EXERCISES 


11.28. Let 3.1 ， 5.6, 4.7, 3.8, 4.2, 3.0, 5.1 ， 3.9, 48 and 5.3, 4.0, 4.9, 6.2, 3.7, 
5.0, 6.5, 4.5, 5.5, 5.9, 4.4, 5.8 be observed independent samples of sizes 
m = 9 and « = 12 from two distributions. With k = 3, use a chi-square test 
to test, with a = 0.05 approximately, the equality of the two distributions. 

11.29. In the median test, with m = 9 and « = 7, find the p.d.f. of the random 
variable V, the number of values of X in the lower half of the combined 
sample. In particular, what are the values of the probabilities Pr (F = 0) and 
Pr(F=9)? 


11.30. In the notation of the text, use the median test and the data given in 
Exercise 11.28 to test, with a = 0.05, approximately, the hypothesis of the 
equality of the two distributions against the alternative hypothesis that 
F(z) > G(z). If the exact probabilities are too difficult to determine for 
m = 9 and « = 12, approximate these probabilities. 

11.31. Using the notation of this section, let U be the number of observed 
values of X in the smallest d items of the combined sample of m + «items. 
Argue that 


Pr (U = u) = 



u = 0, \,... ,m. 


The statistic U could be used to test the equality of the(100/?)th percentiles, 
where (m + n)p = d, of the distributions of X and Y. 


11.32. In the discussion of the run test, let the random variables R t and R 2 
be, respectively, the number of runs of the values of X and the number of 
runs of the values of Y. Then /? = /?,+ R 2 . Let the pair r 2 ) of integers 
be in the space of (R x , R 2 ); then |r, — r 2 \<.\. Show that the joint p.d.f. of 

/?i and R 2 is 2^ _ — ^ ^ ^ = r 2 ', that this joint p.d.f. is 


m 


n 

J2 


m + n 
> m 


that the marginal p.d.f. of /?, is 


if |ri — r 2 \ = 1 ； and is zero elsewhere. Show 

/ 

m 




Of 


+ 

r\ 


十 《 

m 


. ， w, 


and is zero elsewhere. Find In a similar manner, find E(R 2 ). Compute 
E(R) = E(R X ) + E(R 2 ). 


11.6 The Mann-Whitney-Wilcoxon Test 

We return to the problem of testing the equality of two distributions 
of the continuous type. Let X and Kbe independent random variables 
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of the continuous type. Let and G(y) denote, respect¬ 

ively, the distribution functions of X and Y and let X y , X 2 ,..., X m 
and Y it Y 2 ,..., Y„ denote independent samples from these distri¬ 
butions. We shall discuss the Mann-Whitney-Wilcoxon test of the 
hypothesis H 0 : F(z) = G(z) for all values of z. 

Let us define 

Zjj = i ， Xj < Yj, 

= 0, X { > Yj, 

and consider the statistic 

n m 

^=11 

I / = i 

We note that 

m 

I ^ 

r» 1 

counts the number of values of X that are less than YjJ = 1 ， 2, 

Thus U is the sum of these n counts. For example，with m = A and 
« = 3, consider the observations 

x 2 < y y < x { < x A < y t < < y 2 . 

There are three values of x that are less than y t ; there are four values 
of x that are less than y 2 ; and there is one value of x that is less than 
y y . Thus the experimental value of f/ is « = 3 + 4 + 1 = 8. 

Clearly，the smallest value which U can take is zero, and the largest 
value is mn. Thus the space of f/ is {« : « = 0, 1, 2,..., mn). If U is 
large, the values of Y tend to be larger than the values of X, and this 
suggests that F{z) > G{z) for all z. On the other hand, a small value of 
U suggests that F{z) <. G{z) for all z. Thus, if we test the hypothesis 
H 0 : F\z) = G(z) for all z against the alternative hypothesis 
H x : F\z) > G(z) for all z, the critical region is of the form f/ > Ci. If 
the alternative hypothesis is : F\z) < G(z) for all z, the critical region 
is of the form U < c 2 . To determine the size of a critical region, we need 
the distribution of U when H 0 is true. 

If u belongs to the space of {/Jet us denote Pr (U = w) by the symbol 
h(u; m, n). This notation focuses attention on the sample sizes m 
and n. To determine the probability h(u; m, ri), we first note that 
we have m + n positions to be filled by m values of X and n values of 

Y. We can fill m positions with the values of X in any one of 

ways. Once this has been done, the remaining n positions can be 
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filled with the values of Y. When /f 0 is true, each of these arrangements 


has the same probability, 1 



.The final right-hand position of 


an arrangement may be either a value of A" or a value of Y. This position 
can be filled in any one of w + n ways, m of which are favorable to X 
and n of which are favorable to Y. Accordingly, the probability that 
an arrangement ends with a value of A" is mj{m + n) and the probability 
that an arrangement terminates with a value of Y is n/(m + ri). 

Now U can equal u in two mutually exclusive and exhaustive ways: 
(1) The final right-hand position (the largest of the w + n values) in the 


arrangement may be a value of X and the remaining (m — 1) values of 
X and the n values of Y can be arranged so as to have U = u. The 
probability that U = u, given an arrangement that terminates with a 
value of X, is given by h(u; m — l, n). Or (2) the largest value in the 
arrangement can be a value of Y. This value of Y is greater than m 
values of X. If we are to have U = u, the sum of n — 1 counts of the 


m values of X with respect to the remaining n — 1 values of Y must be 
u — m. Thus the probability that U = u, given an arrangement that 
terminates in a value of Y, is given by h(u — m;m,n — 1). Accordingly, 
the probability that U = uis 

h(u; m, n) = w — 1 ， ti) + - m; m,n-\). 


We impose the following reasonable restrictions upon the function 
h(u; m, n): 



h{u\Q,n) = 1, 

« = 0, 



= 0, 

M > 0, 

n ^ 1 ， 

and 

h(u; m, 0)=1 ， 

u = 0, 



. - , / 

= 0 ， 

u > 0, 


and 

h(u; m, n) — 0, 

« < 0, 

w ^ 0, w 之 0. 

Then it is easy, for small values m and rt, to colnpute these proba¬ 
bilities. For example, if m = n = 1, we have 


A(0; 1， 1 ) = \h(0; 0, 1) + \h{- l;l,0) = i-l = 
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机 1 ， 1) = i/i(l ； 0, 1) + 沖 ; l,0) = l-0 + i-l=i ； 

and if m = 1, « = 2, we have 

KO ； 1, 2) = i/»(0; 0, 2) + |/»(-l ； l,l), = ^l+f0 = i, 

KU 1,2)= 0(1; 0, 2) + f/»(0; 1, 1) = I • 0 + H = 士， 

h{2\ 1,2) = \h{2; 0, 2) + 0(1; 1, 1) =\-0+l\ =1. 

In Exercise 11.33 the reader is to determine the distribution of U when 
ffi — 2, n = lj in = 2, n ^ 2^ tn = 1, w — 3j ^iid in = 3, n = 1 • 

For large values of m and n, it is'desirable to use an approximate 
distribution of U. Consider the mean and the variance of U when 
the hypothesis H 0 : F{z) = G(z), for all values of z, is true. Since 

t/ = X L z iv then 

J ■丨 /= 1 

E(u)=t i e ^j)- 

I 7= I 

But 

E(Z U ) = (1) Pr < Yj) + (0) Pr {X- t { 

because, when H 0 is true, Pr (X, < Yj) — Pr {X, > Yj) = Thus 

舖 ) = 咢 . 

To compute the variance of U, we first find 

Em = i i i i E{z u z hk) 

it « 1 h = 1 7—1 /« I 


I I E{Zl) + X S I E{Z^ ik ) 

j — \ / — 1 /f=l j ^ \ / =1 


+ 1 I Z E{Z tJ Z hj ) + X I I I £(Z,Z W ). 

y & I a =* I / — 1 /f = i y^i Z 5 *] 

h _ i k h # i 


Note that there are mn terms in the first of these sums, mn(n — 1 ) 
in the second, mn(m — 1 ) in the third, and mn(m — 1 )(« — 1 ) in the 
fourth. When H Q is true, we know that X h X h , Yj, and K*, / # hj # k, 
are independent and have the same distribution of the continuous type. 
Thus Pr (Xi < Yj) = Moreover, Pr (X t < Yj, X t < Y k ) = } because 
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this is the probability that a designated one of three items is less than 
each of the other two. Similarly, Pr (A", < Y jt X h < Y/) = Finally, 
Pr (Xi < Yj, X h < y*) = Pr < Yj) Pr {X h < Y k ) — Hence we have 

£(4) = (1) 2 Pr {X t < Yj) = l, 

EiZ u Z tk ) = (l)(l)Pr(^< Yj,X.,< Y k ) = \, j ^ k, 

E^jZ.j) = (1)(1) Pr (^ < y y ,^r A < y y .) = I, i^h, 

and 

E(Z u Z hk ) = (l)(l)Pr(^< Y jt X h <Y k ) = \, i ^ h, j ^ k. 

Thus 


- mn mn(n — 1) 
E (^ = Y + 3 — " 


mn(m — 1) mn{m — 1)(” 一 1) 
3 + 4 


and 


4 


mn 


2 + 


n 


m — 1 (m — l)(n — 1) 


3 


4 


mn 

T 


mn{m -f « + 1) 
12 


Although it is fairly difficult to prove, it is true, when F{z) = G(z) for 
all z, that 



mn 

~Y 


mn(m + n + 1) 


12 


has, if each of m and n is large, an approximate distribution that is 
_/V(0, 1). This fact enables us to compute, approximately, various 
significance levels. 

Prior to the introduction of the statistic U in the statistical literature, 
it had been suggested that a test of i/ 0 : F\z) = G(z), for all z, be based 
upon the following statistic, say T (not Student's t). Let The the sum 
of the ranks of Y u Y 2 ,..., Y„ among the m + n items A",,..., X m , 
y l5 .. ., y n , once this combined sample has been ordered. In Exercise 
11.35 the reader is asked to show that 


n{n + 1) 
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This formula provides another method of computing U and it shows 
that a test of H 0 based on U is equivalent to a test based on T. A 
generalization of T is considered in Section 11.8. 

Example 1. With the assumptions and the notation of this section, let 
m = 10 and n = 9. Let the observed values of A" be as given in the first row 
and the observed values of Y as in the second row of the following display: 

4.3, 5.9, 4.9, 3.1, 5.3, 6.4, 6.2, 3.8, 7.5, 5.8, 

5.5, 7.9, 6.8, 9.0, 5.6, 6.3, 8.5, 4.6, 7.1. 

Since, in the combined sample, the ranks of the values of y are 4, 7, 8, 12, 14, 
15,17,18,19, we have the experimental value of Tto be equal to / = 114. Thus 
u = 114 — 45 = 69. If F\z) = G{z) for all z, then, approximately, 

0.05 = Pr 1 645 ) = Pr ( £/ ^ 65.146). 

Accordingly, at the 0.05 significance level, we reject the hypothesis H 0 : F{z )= 
G(z), for all z, and accept the alternative hypothesis //, : F\z) ^ G(z), for 
all z. 


EXERCISES 

11.33. Compute the distribution of C/ in each of the following cases: (a) m = 2, 
n = 1; (b) m = 2, « = 2; (c) m = 1, « = 3; (d) m = 3, « = 1. 

11.34. Suppose that the hypothesis H 0 : F\z) = G(z), for all z, is not true. Let 
/? = Pr (X,- < Yj). Show that Ujmn is an unbiased estimator of p and that 
it converges in probability to /? as moo and n-*co. 

11.35. Show that U = T — [n(n + l)]/2. 

Hint: Let K (l) < F (2) < ••- < Y {n) be the order statistics of the random - 
sample Y x , Y 2 ,..., Y„. If R t is the rank of Y {i) in the c6mbined ordered 
sample, note that Y (i) is greater than Ri — i values of X. 

• ' ", # 

11.36. In Example 1 of this section assume that the values came from two 
normal distributions with means /x, and / ■tj, respectively, and with cpmmon 
variance a 2 . Calculate the Student’s / which is used to test the hypothesis 
H 0 : fXi = /x 2 . If the alternative hypothesis is //,:/!■< 弘 2 , do we accept or 
reject H 0 at the 0.05 significance level? 
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11.7 Distributions Under Alternative Hypotheses 

In this section we discuss certain problems that are related to a 
nonparametric test when the hypothesis H 0 is not true. Let X and Y 
be independent random variables of the continuous type with 
distribution functions F(;c) and G(y), respectively, and probability 
density functions f(x) andg(j). Let A"|, X 2 ,..., A" m and Y t , Y 2 ,..., Y n 
denote independent random samples from these distribtutions. 
Consider the hypothesis H 0 : F(z) = G(z) for all values of 2 . It has been 
seen that the test of this hypothesis may be based upon the statistic U, 
which, when the hypothesis H 0 is true, has a distribution that does not 
depend upon F{z) = G(z). Or this test can be based upon the statistic 
T= U n(n-\- 1)/2, where Tis the sum of the ranks of T,, Y 2 ,. .., Y n 
in the combined sample. To elicit some information about the 
distribution of T when the alternative hypothesis is true, let us consider 
the joint distribution of the ranks of these values of Y. 

Let r (1) < K( 2 ) < ••- < Y (n) be the order statistics of the sample 
Y t , Y 2 ,..., Y„. Order the combined sample, ahd let R, be the rank of 
Y (i)i i == 1,2,..., n. Thus there are i — 1 values of K and /?, — i values 
of X that are less than Y {i) . Moreover, there are /?, — — 1 values 

of X between and Y (i) . If it is given that K (1) = >^i < Y a) — 

_y 2 < … <T (n) = y„, then the conditional probability 

P r ( 及 1 = ，及 2 = r 2 ,.. • ， = 少 ， < y 2 < ''' < y„), (1) 

where < r 2 < ''' <r„<m n are positive integers, can be 
computed by using the multinomial p.d.f. in the following manner. 
Define the following sets: = {x: — oo < a: < y t }, = {a: : , < 

x < j；}, i — 2,. .. ,n, A „ + , = {x : < x < 00 }. The conditional 

probabilities of these sets are, respectively, P\ = ^(^ 1 ), Pi = 
Hyi) - HyO,. •, p n =J\y n ) - fiyn-\),p n+i = 1 - Hyn)- Then the 
conditional probability of display (1) is given by 

(r, - l)!(r 2 _ r, - 1)! •. • (r„ - r„_, - 1)! (w 4- n-r„)\ 

To find the unconditional probability Pr (/?, : = 广 1 ，及 2 = r 2, • . • ， 
R n = r„), which we denote simply by Pr (r, ,..., r n ), we multiply the 
conditional probability by the joint p.d.f. of K ( i) < r (2) < • * • < Y (n) , 
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namely n! g(yi)g(y 2 ) - - - g(y„), and then integrate on y x ,y 2 ,... ,y n . 


That is, 

Pr(r,,r 2 , ...,r n ) = 



Pr(r_, •.., r„\yi 〈… < y„)n\ 


x 犮 Oi) … 客 00 dy' … dy n , 

where Pr (r,,..., r n |^| < < y„) denotes the conditional probability 

in display (1). 

Now that we have the joint distribution of R { , R 2 , ■., R„, we can 
find, theoretically, the distributions of functions of/?,, R 2 ,..., R„ and, 

» n 

in particular, the distribution of T = R,. From the latter we can find 

i 

that of U = T — n(n + 1)/2. To point out the extremely tedious 
computational problems of distribution theory that we encounter, we 
give an example. In this example we use the assumptions of this section. 

Example 1. Suppose that an hypothesis H 0 is not true but that in fact 
/(jc) = 1,0 < jc < 1, zero elsewhere, and g(y) = 2y,0 < y < 1, zero elsewhere. 
Let m = 3 and n = 2. Note that the space of U is the set {«: « = 0,1,..., 6}. 
Consider Pr (U = 5). This event U = 5 occurs when and only when = 3, 
R 2 = 5, since in this section/?, < R 2 are the ranks of y (l) < Y a) in the combined 
sample and U = R t + R 2 — 3. Because F(x) = jc, 0 < jc ^ 1, we have 


Pr ({/ = 5) = Pr (R t =3,R 2 = 5) 


• I I*y2 
^0 ^0 


3!少加一少 i ) 
2! 1! 


2! (2^,)(2>- 2 ) dy { dy 2 



Consider next Pr (U = 4). The event U = 4 occurs if = 2, /? 2 = 5 or if 
/?i = 3， /? 2 = 4. Thus 


Pr(t/ = 4) = Pr (/?, = 2, R 2 = 5) + Pr ( 及 , = 3, /? 2 = 4); 

the computation of each of these probabilities is similar to that of 
Pr (/?, = 3, R 2 = 5). This procedure may be continued until we have computed 
Pr(U = u) for each «e {«: « = 0, 1,.. ., 6}. 

In the preceding example the probability density functions and the 
sample sizes m and n were selected so as to provide relatively simple 
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integrations. The reader can discover for himself or herself how 
tedious, and even difficult, the computations become if the sample sizes 
are large or if the probability density functions are not of a simple 
functional form. 

EXERCISES 


11.37. Let the probability density functions of X and Y be those given in 
Example 1 of this section. Further, let the sample sizes be m = 5 and n = 3. 
If < R 2 < Rj are the ranks of y (0 < K (2) < y (3) in the combined sample, 
compute Pr (/?_ = 2, R 7 = 6, /? 3 = 8). 

11.38. Let X h X 2 ,..., X m bc a random sample of size m from a distribution 
of the continuous type with distribution function F{x) and p.d.f. 
F(x) = f{x). Let Y y , Y 2 ,..., be a. random sample from a distribution 
with distribution function G(^) = [/ 0 < 0. If 0 / 1, this distribution 


is called a Lehmann alternative. With 6 = 2, show that 


m + n 
m 


|(m + » + l)(m + « + 2) • ■ ■ (m + 2n) 


11.39. Let X t , X 2 , A^bea random sample from a continuous-type distribution 
with distribution function and p.d.f. f(x) = F(x). Let Y\, y 2 be a 
random sample of size n~2 from a distribution with distribution function 
G(y) = [i^)] 2 . In the combined sample of 5, determine the probability 
that the Y values have ranks 1 and 3; that is, the order is ^ ^ jc x. 

11.40. To generalize the results of Exercise 11.38, let G(^) = where 

h(z) is a differentiable function such that h(0) = 0, A(l) = 1, and h\z) > 0, 
0 < z < 1. Show that 


Pr(r,,r 2 , 


E[h\V rx )h\V r2 )--h\V rn )\ 



where V t < V 2 < < V M+ „ are the order statistics of a random sample 

of size m + n from the uniform distribution over the interval (0, 1). 


11.8 Linear Rank Statistics 

In this section we consider a type of distribution-free statistic that 
is, among other things, a generalization of the Mann-Whitney- 
Wilcoxon statistic. Let V 2 ,..., V N be a random sample of 
size N from a distribution of the continuous type. Let /?, be the 
rank of F, among V lf V 2 ,, V N , i = \,2,... ,N\ and let c{i) 
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be a scoring function defined on the first N positive integers — that is, 
let >r(l), c(2),,.., c(N) be some appropriately selected constants. If 
a u a 2 ,..., a N are constants, then a statistic of the form 

L= t aM) 

i= I 

is called a linear rank statistic. 

To see that this type of statistic is actually a generalization of both 
the Mann-Whitney-Wilcoxon statistic and also that statistic 
associated with the median test, let N = m + n and 

v t = x u ...,v m = x m , v m+x ^= r,,..., Y n . 

These two special statistics result from the following respective assign¬ 
ments for ； c(0 and a u a 2 ,..., a N : 

1. Tskc c(z) = /, flj = d m ~~ 0 3-iid Q m I = * * ^ Q/n — 1 9 so that 

N m + w 

L= £ aiciRi) = X R h 

i = I f « m 十 I 

which is the sum of the ranks of K r , Y 2 ,... ,Y n among the w + n 
observations (a statistic denoted by T in Section 11.6). 

2. Take c(i) = 1, provided that i <(m + n)/2, zero otherwise. If 

Gi = ... = = 1 and a m + \ = - •' = a N = 0, then 

E > i 

L=t ^(Ri) = £ 

i =* I I = I 

4 * . 

which is equal to the number of the m values of X that are in the 
lower half of the combined sample of/w + n observations (a statistic 
used in the median test of Section 11.5). 

To determine the mean and the variance of L, we make some 
observations about the joint and marginal distributions of the ranks 
R it R 2 ,..., R n . Clearly, from the results of Section 4.6 on the distri¬ 
bution of order statistics of a random sample, we observe that each 
permutation of the ranks has the same probability, 

Pr (/?| = , /?2 = r 2 ,.. ., Rn = ^n) = » 

where r,, r 2 ,..., is any permutation of the first N positive 
integers. This implies that the marginal p.d.f, of Ri is 

1 

Si( r d = S ， r i = l ，】， ， • ，， N ， 
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zero elsewhere, because the number of permutations in which = r, 
is (iV — 1)! so that 


I. rs) D• 


(N- 1)! 
~ N\ ~ 


N 


In a similar manner, the joint marginal p.d.f. of i? f and Rj, i ^ j, is 

1 


gijin, rj) 




n(n -\y 

zero elsewhere. That is, the (n — 2)-fold summation 

(N - 2)! 1 


Z...Z 


m 


M 


n(n - \y 


where the summation is over all permutations in which i?,- = r, and 

= r j- 

Among other things these properties of the distribution of 
R 2 ,...,R n imply that 


N 


取及 ,) 】 =E 咖 ) 


© 


c(l) + … + c(N) 


N 


If, for convenience, we let c{k) = c k , then 

m 及 ‘)] =x i^i = c. 




say, for all i = 1, 2,..., iV. In addition, we have that 

N ^ {c k -cf 


c(Ri) 


=S [dr) 


04 


N 


for all / = 1, 2,..., N. 

A simple expression for the covariance of c ( 尺 ） and c(Rj), i # j, is 
汪 little more difficult to determine. That covariance is 

•W - cMRj) = H (Ck ~^ Ch ^ d) ‘ 

k_h 八 K 八 — 1) 


However, since 


N 


Z ( C k ~ 




= s ( c k - c) 2 + X Z ( c * - ^)( c h - c), 

k = \ k★h 
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the covariance can be written simply as 

砸 W - c)[c(Rj) - c]} = -f 备 : 导 • 

With these results, we first observe that the mean of L is 


Ml = E 


Z a A R i) 


=Z a,E 

i I 

[c(Ri)] = X a,c = Ndc, 

i » 1 

where a = (L a^/N. Second, note that the variance of L is 
tri = [ + XZ aiajE{[c{R) - c][c(Rj) - c]} 

/= 1 i_j 


N N ( c , _ ^)2 

叫中 w 叫 


= K = 


一 f {c k - c ) 2 


亡，華 一 1) 


f (4 - cf 


(A^- 1) X a? - X Z a > a J 


N(N - 1)_ 

However, we can determine a substitute for the second factor by 
observing that 

N Y, ( a i — o) 2 — ^ Y,~ n 1 ^ 1 


N f N 

/ = i \/= I 


m 


N 


z 




=(^- 1) £ a? 

i = i iitj 

So, making this substitution in oi, we finally have that 




£ (c k - C ) 7 


N(N - 1) 


N 


N X (a, - a) 2 


=T ； 一 ~ 7 Z (a, - a) 2 X (c k - cf 
jV — • / = 1 k= 1 

In the special case in which N = m 七 n and 

L= t c(R,), 

/ = W + I 
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the reader is asked to show that (Exercise 11.41) 

A further simplification when c k = c(k) = k yields 

n{m + n 十 1) ， mn(m 十 十 1) 

^ = — 2~ - 1^— ； 

these latter are, respectively, the mean and the variance of the statistic 
T as defined in Section 11.6. 

As in the case of the Mann-Whitney-Wilcoxon statistic, the 
determination of the exact distribution of a linear rank statistic L 
can be very difficult. However, for many selections of the constants 
<3i, a 2 ,..., a N and the scores c(l), c(2 ),..., c(N), the ratio (L — 
has, for large N, an approximate distribution that is 7V(0, 1). This 
approximation is better if the scores c(k) = c k are like an ideal sample 
from a normal distribution, in particular, symmetric and without 
extreme values. For example, use of normal scores defined by 


k 

N+l = 





exp 



dw 


makes the approximation better. However, even with the use of ranks, 
c(k) = k, the approximation is reasonably good, provided that N is 
large enough, say around 30 or greater. 

In addition to being a generalization of statistics such as those of 
Mann, Whitney, and Wilcoxon, we give two additional applications of 
linear rank statistics in the following illustrations. 

Example 1. Let X { , X 2 ,..., X„ denote n random variables. However, 
suppose that we question whether they are observations of a random sample 
due either to possible lack of independence or to the fact that Xy,X 2 ,...,X„ 
might not have the same distributions. In particular, say we suspect a trend 
toward larger and larger values in the sequence X U X 2 ,..>, X„. If 
R, = rank (X t ), a statistic that could be used to test the alternative (trend) 

tt 

hypothesis is L = ^ iR!. Under the assumption (H 0 ) that the n random 

I = I 

variables are actually observations of a random sample from a distribution 
of the continuous type, the reader is asked to show that (Exercise 11.42) 

n(n + l) 2 , n 2 (n + \) 2 (n - 1) 
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The critical region of the test is of the form L>d, and the constant d can be 
determined either by using the normal approximation or referring to a 
tabulated distribution of L so that Pr (L ^ d\ H 0 ) is approximately equal to 
a desired significance level a. 

Example 2. Let(A"i, Y t ), (X 2 , Y 2 ) t •.., Y n ) be a random sample from 
a bivariate distribution of the continuous type. Let /?, be the rank of 
X, among A"|, X 2 , ... ,X„ and be the rank of Y, among Y 2 ,..., V„. 
If X and Y have a large positive correlation coefficient, we would anticipate 
that R, and Q, would tend to be large or small together. In particular, 
the correlation coefficient of (JR,, Q t ), (R 2 , Q 2 ), _. • ， (R n , Q„), namely the 
Spearman rank correlation coefficient, 

E ( 凡 - R)(Q> - Q) 

i = I 

/ i (/?, - r ) 2 f - qy 

V i= I I - I 

would tend to be large. Since R u R 2 ,... ,R„ and Q u Q 2 ,... ,Q„ are 
permutatiotis of 1, 2,...,this correlation coefficient can be shown 
(Exercise 11.43) to equal 

I RiQ, - n{n + l) 2 /4 


n(n 2 - 1)/12 


which in turn equals 


6 m 0,) 2 

,_ l Jll _ 

n(n 2 — 1) 

From the first of these two additional expressions for Spearman’s statistic, it 

n 

is clear that [ R,Qj is an equivalent statistic for the purpose of testing the 

/ as I 

independence of X and Y, say H 0 . However, note that if H 0 is true, then the 

_ " " 

distribution of E QiR„ which fs not a linear rank statistic, and L= 

I - I i=I 

are the same. The reason for this is that the ranks R t , R 2 ,..., R„ and the ranks 
Qi, Q 2 , … ，仏 are independent because of the independence of X and Y. 
Hencx, under H 0 , pairing R t , R 2 ,, R H at random with 1,2,... ,n is 
distributionally equivalent to pairing those ranks with Q t , Q 2 ,..., Q„, which 
is simply a permutation of 1,2,..., n. The mean and the variance of L is 
given in Example 1. 
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EXERCISES 


11.41. Use the notation of this section. 

(a) Show that the mean and the variance of L = ^ c(/? ; ) are equal to 

the expressions in the text. /_m +1 

N 

(b) In the special case in which L = [ R„ show that and (Pi are 


those of T considered in Section 11.6. 
Hint: Recall that 


E ^ = 

k • I 


N(N+ \)(2N -I- 1) 
- 


11.42. If X x , X 2 ,... ,X„ is a random sample from a distribution of the 
continuous type and if R, = rank (X), show that the mean and the variance 
of L = L iR f are n(n + If/4 and r^{n + l) 2 (/i — 1)/144, respectively. 

11.43. Verify that the two additional expressions, given in Example 2, for the 
Spearman rank correlation coefficient are equivalent to tlie first one. 

Hinv.Y R] = n(n + 1)(2« -f 1)/6andS ( 凡一 Q,) 2 /! = Qf)/2 - 

s m- 

11.44. Let X t , X 2 ,..., be a random sample of size n = 6 from a 

distribution of the continuous type. Let R t = rank {X) and take a, = a 6 = 9, 

6 

a 2 = = 4, a 3 = a 4 = 1. Find the mean and the variance of L = f a ,•/?,•， 

/ = 1 

a statistic that could be used to detect a parabolic trend in X 2 ,.. ■, X 6 . 

11.45. Let Rj be the rank of X h i = 1,2,... ,9. The statistic 

= (/?i + + 及 3 ) + 2(/?4 4- ~h Re) + 3(/?7 + /?g ~h is used to test 

a trend in the data. If, in fact, X x , X 2 ,..., X 9 are observations of a random 
sample from a continuous-type distribution, what are the mean and the 
variance of W1 

11.46. Let X\, X 2 , X 4 , be a random sample of size n = 5 from a 

continuous-type distribution. Let /?, be the rank of X h i = 1 ， 2, 3,4, 5. 

(a) Compute the mean and the variance of L = R s — R]. 

(b) Find the distribution of L. 

11.47. In the notation of this section show that the covariance of the two 

N N 

linear rank statistics, L| = ^ and L 2 = 二 b^R,), is equal to 

J C9I I I B I 

，-她 ~b) t (c k ~W k ~ d)/(N - 1), 

/ = 1 A = 産 


where, for convenience, d k - =m- 
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11.9 Adaptive Nonparametric Methods 

Frequently, an investigator is tempted to evaluate several test 
statistics associated with a single hypothesis and then use the one 
statistic that best supports his or her position, usually rejection. 
Obviously, this type of procedure changes the actual significance level 
of the test from the nominal a that is used. However, there is a way in 
which the investigator can first look at the data and then select a test 
statistic without changing this significance level. For illustration, 
suppose there are three possible test statistics W x , fV 2 , ^ of the 
hypothesis H 0 with respective critical regions C,, C 2 , C 3 such that 
Pr e C,; H 0 ) = 0 L,i= 1, 2, 3. Moreover, suppose that a statistic Q, 
based upon the same data, selects one and only one of the statistics fV,, 
W 2 , fV 3 , and that Wis then used to test H 0 . For example, we choose 
to use the test statistic if 0 € Z>,, i = 1, 2, 3, where the events defined 
by Z),, D 2 , and Z) 3 are mutually exclusive and exhaustive. Now if Q and 
each fVj are independent when H 0 is true, then the probability of 
rejection, using the entire procedure (selecting and testing), is, under 
Ho, 

Pr (QeD u C,) + Pr (g g Z) 2 , W 2 g C 2 ) + Pr (QgD^W^Q) 

= ?r(QeD l )?r(W l eC,) + PT(QsD 2 )PT(W 2 EC 2 ) 

+ Pr (0 g Z) 3 ) Pr(^ 3 GC 3 ) 

=a[Pr (QsDi) + Pr (g g D 2 ) -f Pr (0 € Z) 3 )] = a. 

That is, the procedure of selecting using an independent statistic Q 
and then constructing a test of significance level a with the statistic W ； 
has overall significance level a. 

Of course, the important element in this procedure is the ability to 
be able to find a selector Q that is independent of each test statistic W. 
This can frequently be done by using the fact that the complete 
sufficient statistics for the parameters, given by H 0 , are independent of 
every statistic whose distribution is free of those parameters. For 
illustration, if independent random samples of sizes m and n arise from 
two normal distributions with respective means /i, and/i 2 and common 
variance o 2 , then the complete sufficient statistics X, Y, and 

V = f J (X i -X ) 2 + i(y i - Y ) 2 
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for fj .], fj. 2 , and a 1 are independent of every statistic whose distribution 
is free of fi 2 , and a 1 such as 


m ^ 

Z - Xf 

j_ 

t(y<- Yf 


Z 1^- - median ( 不 )| 

J_ 

f I y, -median (y,)| 
1 


range ( 不， X 2 ,..., X m ) 
ranged Y 2 ,..., Y n ) 


Thus, in general, we would hope to be able to find a selector Q that 
is a function of the complete sufficient statistics for the parameters, 
under H 0 , so that it is independent of the test statistics. 

It is particularly interesting to note that it is relatively easy to use 
this technique in nonparametrie methods by using the independence 
result based upon complete sufficient statistics for parameters. How can 
we use an argument depending on parameters in nonparametric 
methods? Although this does sound strange, it is due to the unfortunate 
choice of a name in describing this broad area of nonparametric 
methods. Most statisticians would prefer to describe the subject as 
being distribution-free, since the test statistics have distributions that 
do not depend on the underlying distribution of the continuous type, 
described by either the distribution function F or the p.d.f. /. In 
addition, the latter name provides the clue for our application here 
because we have many test statistics whose distributions are free of the 
unknown (infinite vector) “parameter” f (or /)• We now must find 
complete sufficient statistics for the distribution function F of the 
continuous type. In many instances, this is easy to do. 

In Exercise 7.50, Section 7.7, it is shown that the order statistics 
Yi < Y 2 < • • < Y n of a random sample of size n from a distribution 
of the continuous type with p.d.f. F(x) = f(x) are sufficient statistics 
for the ^parameter 55 / (or F). Moreover, if the family of distributions 
contains all probability density functions of the continuous type, 
the family of joint probability density functions of Y x , Y 2 ,... ,Y n is 
also complete. We accept this latter fact without proof, as it is beyond 
the level of this text; but doing so, we can now say that the order 
statistics Y 2 ,..., Y„ are complete sufficient statistics for the 
parameters / (or F). 

Accordingly, our selector Q will be based upon those complete 
sufficient statistics, the order statistics under H 0 . This allows us to 
independently choose a distribution-free test appropriate for this type 
of underlying distribution, and thus increase our power. Although it 
is well known that distribution-free tests hold the significance level a 
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for all underlying distributions of the continuous type, they have often 
been criticized because their powers are sometimes low. The 
independent selection of the distribution-free test to be used can help 
correct this. So selecting~or adapting the test to the data — provides 
a new dimension to nonparametric tests, which usually improves the 
power of the overall test. 

A statistical test that maintains the significance level close to a 
desired significance level a for a wide variety of underlying distributions 
with good (not necessarily the best for any one type of distribution) 
power for all these distributions is describe'd as being robust. As an 
illustration, the T (Student's t) used to test the equality of the means 
of two normal distributions is quite robust provided that the underlying 
distributions are rather close to normal ones with common variance. 
However, if the class of distributions includes those that are not too 
close to normal ones, such as the Cauchy distribution, the test based 
upon T is not robust; the signifidance level is not maintained and the 
power of the r-test is low with Cauchy distributions. As a matter of 
fact, the test based on the Mann-Whitney-Wilcoxon statistic (Section 
11.6) is a much more robust test than that based upon T if the class 
of distributions is fairly wide (in particular, if long-tailed distributions 
such as the Cauchy are included). 

An illustration of this adaptive distribution-free procedure that is 
robust is provided by considering a test of the equality of two 
distributions of the continuous type. From the discussion in Section 
11.8, we know that we could construct many linear rank statistics by 
changing the scoring function. However, we concentrate on three such 
statistics mentioned explicitly in that section: that based on normal 
scores, say L,; that of Mann-Whitney-Wilcoxon, say L 2 ; and that of 
the median test, say L 3 . Moreover, respective critical regions C l5 C 2 , 
and C 3 are selected so that, under the equality of the two distributions, 
we have 

a = Pr (L| g Ci) = Pr {Li € C 2 ) = Pr (L 3 e C 3 ). 

Of course, we would like to use the test given by L, e C, if the tails of 
the distributions are like or shorter than those of the normal 
distributions. With distributions having somewhat longer tails, L 2 g C 2 
provides an excellent test. And with distributions having very long tails, 
the test based on L 2 g C 3 is quite satisfactory. 

In order to select the appropriate test in an independent manner we 
let < V 2 < ''' < V N , where = w + «, be the order statistics of 
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the combined sample, which is of size N. Recall that if the two 
distributions are equal and thus have the same distribution function 
F, these order statistics are the complete sufficient statistics for the 
paramefer F. Hence every statistic based on V 2 ,..., is inde¬ 
pendent of Lj, L 2 , and L 3 , since the latter statistics have distributions 
that do not depend upon F. In particular, the kurtosis (Exercise 1 . 102 , 
Section 1 . 9 ) of the combined sample, 

1 n 一 

jjKK-vy 

if ， 

■h Z - v? 

1-1 _ 

is independent of L,, L 2 , and L 3 . From Exercise 3.64, Section 3.4, we 
know that the kurtosis of the normal distribution is 3; hence if the two 
distributions^ were equal and normal we would expect AT to be about 
3. Of course, a longer-tailed distribution has a bigger kurtosis. Thus 
one simple way of defining the independent selection procedure would 
be to let 

D, — {k: k < 3}, D 2 = • 3 < ^ < 8}, D 3 = {A:: 8 < k}. 

These choices are not necessarily the best way of selecting the 
appropriate test, blit they are reasonable and illustrative of the 
adaptive procedure. From the independence of K and (L,, L 2 , L 3 ), we 
know that the overall test has significance level a. Since a more 
appropriate test has been selected, the power will be relatively good 
throughout a wide range of distributions. Accordingly, this 
distribution-free adaptive test is robust. 

EXERCISES 

< . .■ ( 

11*48. Let f{x) be a distribution function of a distribution of the continuous 
type which is symmetric about its medianWe wish to test H 0 ： = 0 
against H { : ^ > 0 . Use the fact that the 2n values, and — X h 
i = 1 , 2 ,.. . ,n, after ordering, are complete sufficient statistics for F, 
provided that is. true. Then construct an adaptive distribution-free test 

based upon Wilcioxon's statistic and two of its modifications given in 
Exercises 11.23 and 11 . 24 . 

11.49. Suppose that the hypothesis H 0 concerns the independence of two 
random variables X and Y. That is, we wish to test H 0 : F\x, y) = F] ( 太 ) 尸 2 (>0, 
where F, F y , and F 2 are the respective joint and marginal distribution 
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functions of the continuous type, against all alternatives. Let (A",, 7,), 
(X 2 , Y 2 ),..., (X„, Y„) be a random sample from the joint distribution. 
Under H 0 , the order statistics of X t , X 2 ,..., X„ and the order statistics of 
Y y , Y 2 ,..., Y„ are, respectively, complete sufficient statistics for F t and F 2 - 
Use Spearman’s statistic (Example 2, Section 11.8) and at least two 
modifications of it to create an adaptive distribution-free test of H 0 . 

Hint: Instead of ranks, use normal and median scores (Section 11.8) to 
obtain two additional correlation coefficients. The one associated with the 
median scores is frequently called the quadrant test. 


ADDITIONAL EXERCISES 

11.50. Let F, < Y 2 < • ■ < Y 5 be the order statistics of a random sample of 
size n = 5 from a distribution of the continuous type with distribution 
function F. Compute Pr {f{Y 2 ) + [1 - 汽 F 4 )] >{}. 

11.51. Let Y x < Y 2 < ■ • • < Y 6 be the order statistics of a random sample of 
size « = 8 from a distribution of the continuous type with median i^. 
Compute Pr (Y 2 < ^ < Y-,). 

11.52. Let X { , X 2 ,..., X 6 bedi random sample of size n = 8 from a symmetric 
distribution of the continuous type with distributional median equal to 
zero. Modify the regular one-sample Wilcoxon W^by replacing the ranks 
1,2, 3,4, 5, 6 , 7, 8 by the scores 1, 1,2,2, 2,2, 3, 3, to obtain W g . Compute 
the mean and variance of W g . 

11.53. Let Y t < Y z < Y^< Y 4 be the order statistics of a random sample of 

size n — 4 from a continuous-type distribution with distribution function 
尺 x) and unknown 75th percentile 73 . 

(a) WhatisPr(r 3 ^^. 75 < r 4 )? 

(b) What is the p.d.f. of F = 取 4 ) — W)? 

11.54. Let X s , X 2 , X y , X 4 , be a random sample from a continuous-type 
distribution that is symmetric about zero. If we modify the one-sample 
Wilcoxon by replacing the ranks 1 , 2 , 3,4,5 by the scores 1,1,1,3,4, what 
is the m.g.f. of this new statistic? 

11.55. Let X t , X 2 and Y if Y 2 be independent random samples, each of size 
n = 2, from distributions with respective probability density functions 
f(x) = 1 , 0 < x < 1 , zero elsewhere, and g(_y) = 3^, 0 < y< 1 , zero 
elsewhere. Compute the probability that the ranks of the y-values in the 
combined sample of size 4 are 2 and 4. 

11.56. Let X r , X 2 ,... be a random sample of size n = 6 from a 
continuous-type distribution with distribution function F(x). Let 
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if X x is the sample median Y 2 
otherwise. 


Find the distribution of U. 

(b) Argue that U and (J^!, Y 2 , 3^) are independent. 

(c) Argue that U and X are independent. 

11.60. Let JT have a p.d.f._/(jc) of the continuous type that is symmetric about 
zero; that is,y(jc) = /(—jc) for all real x. Show that the joint m.g.f. 

£{exp [/,|A1 + / 2 sign(J10]}= 2 dx , 

^ •mm mm wm 

and thus \X\ and sign (X) are independent. 

11.61. Let X — 6 = 0 — X mean that X — 6 has the same distribution as 
6 — X; thus X has a symmetric distribution about 6 . Say that Y and X are 
independent random variables and Y has a distribution which is also 
symmetric about 9. Show that X — Y has a distribution that is symmetric 
about zero. 

Hint: Write X- Y=X-d-(Y-d) = e~X-(e- Y)=Y-X. 


/?/ = rank(Jr,) and consider the scores c(l) = c(2) = 1, c(3) = 2, c(4) = 3, 
c(5) = c( 6 ) — 4. 

(a) What are the mean and the variance of L - = ciR,) + c(R s ) + c ⑽ 

(b) Why areL and the sample range R = max «•) — min (X) independent? 

11.57. Let X u X 2 , A" 3 , X 4 , X s be a random sample from a distribution with 
p.d.f. f(x) = e~ x , 0 < jc < oo, zero elsewhere. Find the probability that both 
X 、 and X s are less than X 2 , and X A . Is this answer the same for every 
underlying distribution of the continuous type? 

11.58. Let X x , X 2 ,..., X $ be a random sample of size n — 5 from a 
distribution of the continuous type. Let /l ； = rank«.) among 
X u …， X 5 . Find the mean and variance of 

Z/ = + 2(/?2 + + /?4) + 3/?j. 

11.59. Let X { , denote a random sample of size « = 3 from a 

continuous-type distribution with distribution function F. It is well known 
that the order statistics Y t < Y 2 < Y 3 are complete sufficient statistics forF. 
(a) Let the statistic U be defined as follows: 


f 1 

1 o 
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11.62. Let X 2 ,..., be a random sample from a distribution that is 
symmetric about 6. Let W and Tbe two statistics enjoying the following 
properties, respectively: 

吵 I ， … ， A) = -x„), 

^(x, + A,.. ., + A) = fV(x ly ..., j： n ), 

so that W is an even location invariant statistic like S 1 or the range; 

71( 上 1 ,..., x n ) — — 71( 一 ... f — 又”)， 

+ A r ... ,x„ + A) = Tl(x h ... y xJ + /i, 

so that T is an odd location statistic like X or the median. Show that 

[7XX u ...,X n )-e, W(X iy ...,X„)] 

X H )\ 

Hint: Write the left-hand member as 

[TO W{X,-e,... y X n -Q)\ 
using the properties of T and W. Then use the fact that substitute 

11.63. The result of Exercise 11.62 implies that T has a conditional 
distribution that is symmetric about 9, given w. Of course, T has an 
unconditional symmetric distribution about 6. Moreover, it also implies 
that if appropriate expectations exist, 五 (r|F= w) = 0 and cov (T, WO = 0. 
Suppose that T u T\,... ， T k and fV h W 2 ,.. . y W k represent k such T and 
W statistics, so that + • ■ • + W k = 1. 

~ k "I 

⑻ Show that £ £ = d. 

j 1 _ 

- j 1 'j •, 

(b) Let T] = X and T 2 = nt, the sample median. Let W\ = \ U K < 4 and 

zero otherwise, where K is the sample kurtosis, and let W 2 = \ — W x . 

i 

Consider T = Y, Is its expectation equal to ffl If so, note that T 

i =■ I _ 

is an adaptive unbiased estimator which equals X for certain ^-values 
and m for others. 


* - w ' 
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0.000 0.000 
0.003 0.001 
0.014 0.006 
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0.100 0.055 
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0.029 

0.067 

0.130 

0.220 

0.333 
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0.697 

0.792 

0.864 

0.917 

0.951 

0.973 

0.986 

0.993 

0.997 

0.998 

(X999 

1.000 


0.135 0.050 0.018 0.007 0.002 0.001 

0.406 0.199 0.092 0.040 0.017 0.007 

0.677 0.423 0.238 0.125 0.062 0.030 

0.857 0.647 0.433 0.265 0.151 0.082 

0.947 0.815 0.629 0.440 0.285 0.173 

0.983 0.916 0.785 0,616 0.446 0.301 

0.995 0.966 0.889 0762 0.606 0.450 

0.999 0.988 0.949 0.867 0.744 0.599 

1.000 0.996 0.979 0.932 0.847 0.729 

0.999 0.992 0.968 0.916 0.830 
1.000 0.997 0.986 0.957 0.901 
0.999 0.995 0.990 0.947 

1.000 0.990 0.991 0.973 

0.999 0.996 0.987 

i.000 0.999 0.994 

0.999 0.998 
1.000 0.999 
1.000 




0 0.607 0.368 0.223 
I 0.910 0.736 0.558 
0.986 0.920 0.809 
0.998 0.981 0.934 
1.000 0.996 0.981 
0.999 0.996 
1.000 0.999 
1.000 


TABLE I 

The Poisson Distribution 

^(^<^)= E 

w = 0 

- £(X) 

2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 
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0.000 0.001 0.004 3.84 

0.020 0.051 0,103 5.99 

0.115 0.216 0.352 7.81 

0.297 0.484 0.711 9.49 

0.554 0.831 1.15 II.I 

0.872 1.24 1.64 12.6 

1.24 1.69 2.17 14.1 

1.65 2.18 2.73 15.5 

2.09 2.70 3.33 16.9 

2.56 3.25 3.94 18.3 

3.05 3.82 4.57 19.7 

3.57 4.40 5.23 21.0 

4.11 5.01 5.89 22.4 

4.66 5.63 6.57 23.7 

5.23 6.26 7.26 25.0 

5.81 6.91 7.96 26.3 

6.41 7.56 8.67 27.6 

7.01 8.23 9.39 28.9 

7.63 8.91 10.1 30.1 

8.26 9.59 10.9 31.4 

8.90 10.3 11.6 32.7 

9.54 11.0 12.3 33.9 

10.2 11.7 13.1 35.2 

10.9 12.4 13.8 36.4 

11.5 13.1 14.6 37.7 

12.2 13.8 15.4 38.9 

12.9 14.6 16.2 40.1 

13.6 15.3 16.9 41.3 

14.3 16.0 17.7 42.6 

15.0 16.8 18.5 43.8 


•This table is abridged and adapted from “Tables of Percentage Points of the Incomplete Beta 
Function and of the Chi-Square Distribution,” Biometrika, 32 (1941). It is published here with 
the kind permission of Professor E. S. Pearson on behalf of the author, Catherine M. Thompson, 
这 nd of the Biometrika Trustees. 


TABLE II 

The Chi-Square Distribution* 

Pr (X ^ x) 
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•This table is abridged from Table III of Fisher and Yates; Statistical Tables for Biological, 
Agricultural, and Medical Research y published by Oliver and Boyd, Ltd., Edinburgh, by 
permission of the authors and publishers. 


Pr(r^f) 


TABLE IV 

The t-Distribution* 
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TABLE V 

The F-Distribution* 
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Pr(F^b) 


10 12 15 


Pr (F<b) 


n^^rmirxlr 2 r^> l - x 


Aptaibx g 


468557444703986662624372 
2 9 丨 999 8 4 6 . 5 8 4 4 6 9 

+477D 6 444743 I91754685289 

2 9 I 999 847 584 469 
6 I 3 9 I 2 I 

426956.4.4.4.79.4.2.96.84.5.74.62J 

2 9 0 9 9 9 8 4 7 5 8 4. 4 6 0 
^ J— 3 ^1 rs I I 

Hs3i34448l5300907776ars 
2496<59.9.9.8.4.7.6.8.4,4.6.o. 

J957J244485550498882763 

2 9 9 9 9 9 847 6 8 4. 460 
s 丨 3 9 I 2 I I 

3748>28.4.4.4.89.67.09.07.0.88.85.5 
29y 9 9 9 847 695 4 6 0 

I Q/ — f*sl I ^1 

♦ Tr 4 60 58 
3437 59 .3.3.3. 9 J. 9J .2.2. 9.9 7 
29^ 9 9^ 847 695 460 

po On I r>l I ^1 

)l i, I 6 6 6 5 
3012264.3J.3.0.9.2.2.3.5.0I..0 
2^7 999 948 695 57 I 

1/*^ I 夕 ^1 ^1 

25KJ025.2.2.2.I2I..7.39.60.0.I9.39.4 
2^^ 9 9 9 9 5 8 6 9 6 5 7 — 

— rn On — Txfc — . 

8 9 8 I 6 
I66403.2.2.2.2.4.55.9J7.47—; 
25S4C9.9.9.9.5..9.6.9.6.US7.2. 

oo!oo99.0.o.o.55.0.8.94.6.0.79.43.3 
2 ® 9 9 9 ^ 9 6 0 6 0 8 5 8 3 

6 I 袖 52.5.5.5l..4~:.7l.2.2. 6 l.0J 
I 8©8 074 72 1- 606 
4. I 5 9 I I 3 I 2 I I 


5 5 5 5 5 
57 夕 57 夕 579 57 沒 579 

Q/ ^7 

•■• #♦_ ■■ _a ■ _» • 

ooo ooo ooo ooo ooo 





to 


10 12 


Pr(F<b) 


.9.2.56.5I.57.3I.22—^.52.0I.77.96.85.52.56.62.I8.0I.40.86.52 
3. 5.7. r*s 4.6. 3. 4.5. 3.rri 4.2. 3. 4.2. 3. 4.2.2. 3. 


.00.37.7257.67.47.28.20.67.07.87.II.9I.62.71.69.28.I6.48.96.67 
4.5.7.3.4.6.3.4.5.3.3.5.2. rri 4.2.3.4.2.2.3. 


^✓^7 462 50 丨 4 6 6 8 2 5 5 7 0 A ^ o 
.(>4.8.6.7.6 .3.3 .8-^.9 .2 .9.7.8.7 .3.3 -s.o.a 

4.l/s7.3.4.\4}3.4.5.r^3.5.2.3.4.2.3.4.2-r^rri 


.I0.52.98. 68 .82.72.39.36.9I.I 8 .03.35.02.78.94.80.44.39.59.I2.89 
4.5.7 .3 .4. 6 . 1 4.5. r*s 4 . 5 .rrjrri 4.2. 3 .4.2. 3.3 . 


.I5.60 .IO .73.90.84.44.43.03.23 .IO .47.07.85.06.85.5I.50.64.20.00 
4.5.8. 3.4. 6. 3.4. 6. 3, 4.5. rvi3. 5. 2.3. 4. 2.14. 


.2I70.26.79.9999.50.53.I8.29.20.6I.I4.95.20.9 1.6I.64.7I.29.I4 
4.5.8.3.4.6.3.4.6. rK 4.5.3.3.5.2.3,4.2.rr;4. 


288247.87.I2I958.6537.373280.22.07.39.00.73.82.79.4I.32 
45.8.3.5.7.HS4.vb3.45.3.45.i3.42.3.4. 


399975972946698263484806332464118906905856 
458 3 5 7 346 346 345 3 3 5 234 


3 3 5 2 2 5 丨 32Gf8f7^.62M)6K)J9 
52 丨丨 5 8 & oo 674 4 4 ox 214 088 

469 4 5 7 3 5 7 3 4 6 3 4 5 3 4 5 334 

766078.35.89.45.07.42.59.86.08.99.7I.83.55.49.47.95.29.I5.42 
4.6.9.4.5.8.4.5.7.3.5.6.3.4.6.3.4.5.3.4.5. 

4 6 4 4 5 ^ ^ 5 6 丨 2 0I66I90QP>876 
丨 fs97-&54i0(6 270 145 ® 丨 9 6 7 3 

570 469 468 458 457 356 346 

998 1759072325731 22 16.96.94075.55.33.54.20.68 
5.8.3.5.8.2.5.7.L5.7.0.4.6.0.4.6.9.4.6.8. 


5 5 5 5 5 5 > XJ 
575s 579 579 579 g 7^9>57>9>5>7>9 
9 9 9 9 9 9 y 9 999 9 9 9 9 9 9 9 9 9 

o.o.o.o.o.o.o.o.o.o.o.o.o.o.o.o.o.o.o.o.o. 


•This table is abridged and adapted from “Tables of Percentage Points of the Inverted Beta Distribution，” Biometrika, 33 (1943). It is published here 
with the kind permi^ion of Professor E. S. Pearson on behalf of the authors, Maxine Merringtan and Catherine M, Thompson, ami of the Biometrika T rustees. 






1.22 

1.26 



- -(T)/( 

1.2”b) i - ⑺ 

134 ⑻ 1(b) 垚 . 

(c )「 A 、 


1000、 


'20 


[5/(8 - X)]. 


.37 j. 

•38 备； f • 

.39 為 . 

.40 j ,j ■ 

.42 (a) 0.18. (b) 0.72. 

(c) 0 . 88 . 


APPENDIX C 


Answers to Selected 

Exercises 

* 




2 〕 w2}i}-yf ^ 

1 ， lvvvvv4}. + 
o,2JCxxxo<^r 
II II V VI V <):/= 

X X o 1 o o V- + XI 
H K 文 HXH(^x^u(^ 


,r 


rt rt/I/I/I/I/I /I hj fl 


■ 2 . 

l ， 7r/29• 

00;0: ， 0. 




R 


cl 


411363164 

1.T32 - 
•，• ， 1 
13364 •: 

XI/ \—/ \ly \ly V). • 9 • f i ( I -1 ♦ ， 3-4 

Nsi*808T"l6005i2—5 1 1-4311320^ 


1 

1 


2 7 8 
• • • 

1 1 1 


91 : i 01 . 1 11 . 1 21 . 1 31 . 1 51 . 1 8 . 1 91 .201 .2 1 


552 






Answers 


553 


CHAPTER 2 

2.1 H ; 0; 去 ； j _ 

2.2 

2.6 ze - *, 0 <z<oo; 

0 elsewhere. 

2.7 - In z, 0 < r < 1 ; 

0 elsewhere. 

2.10 5^2, 0<x 2 < 1; 

0 elsewhere. 

2.11 (3x, + 2 )/( 6 x, + 3); 

( 6 ^ + 6 x, + l)/(2)(6x, + 3) 2 . 
2.13 3 jc 2 /4; 3jcf/80. 

2.18 (b) 1 /e. 

2.20 (a) 1 . (b) - 1 . (c) 0 . 

2.21 (a) 1/y/m. 

2.31 ir. J 

2.32 1. 

2.36 

2.38 (a) i, 0. 

2.39 l-(l-y) l \ 0<^<1; 

12(1-^)", 0< 少 <1. 

2.40 g(y) = [y i -(y-\y]/6\ 

少 =1 ， 2, 3,4, 5, 6 . 

2.42 b 2 = ff\(j)i2—p\iP23) / 

[<r 2 (l — P 23 )]； 
厶 3 = (P13 — P12P23)/ 

MI-p],)]. 


1.83 

1.85 

88 

89 


90 

99 

101 


⑻！- 

(b) 

$7.80. 

7 

(a) 1.5, 0.75. (b) 0.5, 0.05. 

(c) 2 ; does not exist. 
e r /( 2 -〆)， / <ln 2 ; 2 ; 2 . 

10; 0; 2; -30. 

2^2 2^2 


1.45 0.1029 for (a), (b), (c), (d). 
(e) 0.4116. 

1.46 ! ， I • 

I 47 n ， n ，士， b ， A • 

1.48 (a) 5 - (b) ji. 

1.49 ii, 1. 


1.51 ⑻ 


3 ( 


39 、 
5 一 x , 


0 


: 0 ， 1 ， • • • ， 5. 


r 39 

5 


(b) 


C 0 ( 


39' 
4 , 


⑺ 


X- 


1.54 ⑷点， 

( b ) 岳 • 

1-56 x=0; 

12-2jc 


1,2, ... ,9. 


36 


x= 1, 2, 3,4, 5. 


1.59 

1.61 

1.63 

1.64 
1.66 
1.69 


1.71 

1.72 


1.74 

1.76 

1.79 

1.80 
1.81 


I 1.2 25 

27 » l ， 5 ， 5g _ 

⑻ l.(b) l ⑹ 2 . 

(a) 0 , x<0; 1 -( 1 -x) 3 , 
0 ^jt< 1 ; 1 , 1 <,x; 

1 ->yi- 

⑻！ . （ b) 0 . (c) 去 ■ (d) 0 . 
0,少<0; 〆 ，0<少<1; 1， 
\<y, 2y,0<y<l; 

0 elsewhere. 

1 . 1 

2^4- 

0 , x< 0 ; \-e~ x l 2 fi^x. 

\ e~ x /, 2 0 <x;0 elsewhere. 

"3^?，0<少<1; 1/6^5, 

1 <_v<4; 0 elsewhere. 

2; 86.4; — 160.8. 

3; 11; 27. 


50 


/^vf.84. 

*—^3112518 o 

0305 1014 
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CHAPTER 3 


3.1 

40 

8 l - 



3.4 

147 

5l2 ' 



3.6 

5. 



3.8 

3 

16 * 



3.10 

65 

81- 



3.13 

(!)(!，— 3 , 

x = 

:3, 4, 5, 

3.14 

5 

72 ' 



3.17 

丄 

6 • 



3.18 

24 

535 • 



3.20 


ii 

T - 


3.21 

25 

T • 



3.22 

0.09. 



3.25 

4 x e— 4 /x !， 

X- 

= 0 , 1 , 2 , 


3.26 0.84. 

3.31 2. 

3.33 (a) exp [ — 2 + e ,2 ( 1 + 〆' )】• 
(b) /i| = 1, ^2 = 2, 

1 ， = 

P~y/2/2. 

(C) y/2. 

3.34 0.05. 

3.35 0.831, 12.8. 

3.36 0.90. 

3.37 X \A). 

3.39 ,0 〈 : v<oo, 

3.40 2,0.95. 

3.45 

3.46 x 2 (2). 


3.71 (a) 0.264. (b) 0.440. 
(c) 0.433. (d) 0.642. 

3.73 p = |. 

3.74 (38.2, 43.4). 


CHAPTER 4 


4.2 晶. 

4.3 0.405. 

4.6 H. 

4.7 

4.9 (”+1)/2; (/i 2 -1)/12. 

4.10 a + bx; b 1 ^ . 

f(2). 

5, 0<少<1.; 

l/2y, 1 <_y <oo. 
y'\ 0<,y< 1; 15 少 l4 ， 
0 <>>< 1. 

4.16 f. 

4.17 5 , _v = 3, 5, 7. 

4.19 ❾％ , 少 =1 ， 8, 27 ， 

4.20 


4.11 

4.14 

4.15 




: Vi 

g\(yi) 

1 

1 

36 

2 

4 

36 

3 

6 

36 

4 

4 

3? 

6 

12 

56 

9 

9 

36 


3.49 0.067; 0.685. 

3.51 71.3, 189.7. 

4.25 

0<y<21. 

3.52 ^/in 2/n. 

3*57 0 774 

4.32 

4.34 

e~ y \ 0 <j| <ao. 

(2 少 i)(4>^) ， 0<^,<1, 

3.58 s/ljn-, (n — 2)jn. 

3.59 0.90. 

4.35 

0 <_y 2 < i. 

• + 灼； 

3.60 0.477. 



3.61 0.461. 

4.36 

(a) 20. (b) 1260. (c) 495. 

3.62 N(0, 1). 

4.37 

10 

茄 . 

3.63 0.433. 

4.40 

0,05. 

3.64 0; 3. 

4.43 

1/4.74, 3.33. 

3.69 N(0, 2). 

4.48 


3.70 (a) 0,574. 


0 <j,<oo, 0 <>» 2 < 2 n, 

(b) 0.735. 


0 <>» 3 ^n. 
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4.49 y 7 yie~ y \ 0< 凡 <1 ， 
0 <>» 2 < 1 , 0<^3 < 00 . 
4.53 1/(2^), 0 <^<K 
454 e- yil2 /(2n^/y,-yl), 

~sfy\<yi<sfy^ 

0 <^,< oo. 

4.56 l-(\-e-y. 

4.57 

4.62 咅 . 

4.63 4^ziz\zl ， 0<Z! < 1, 
0 <z 2 < 1, 0 < 2 3 < 1. 

4.64 点 . 

4.69 L 

4.70 6uv(u + v), 

0<u<v< 1. 


4.75 


y 

舡 >0 

2 

1 

36 

3 

36 

4 

3 

36 

A 

5 

今 

36 

< 

6 

36 

7 

6 

36 

< 

8 

j 

涵 

9 

4 

36 

10 

3 

36 

11 

2 

36 

12 

1 


4.76 0.24. 

4.79 0.159. 

4.82 0.159. 

4.88 0.818. 

4.91 (b) -1 or 1. 

(c) Zi=(Tj Yj-\- fij. 

4.92 X fl A = 0. 


4.94 6.41. 

4.95 n = 16. 

4.97 {n-\)a 2 jn\ 
2(n — l)(j*/n 2 . 

4.98 0.90. 

4.100 0.945. 

4.102 0.618. 


4.103 0.78. 

4.104 f 

4.105 7. 

4.107 2.5; 0.25. 

4.109 -5; 60-12y/6. 

4.110 <j {- . 

4.113 0.265. 

4.115 22.5, 爭 . 

4.116 r 2 >4. _ 

4.118 H2^l/y/o 2 \^l + ^Wl + t^Wl • 
4.121 5/^39. 

4.125 ^ +ff2/2 ;^ +0 V 2 -l)- 


CHAPTCR 5 

5.1 Degenerate at fi. 

5.2 Gamma (a= 1 ，於 =1). 

5.3 Gamma (a= 1, 1). 

5.4 Gamma (a=2;fi= 1). 

5.13 0.682. 

5.14 (b) 0.815. 

5.17 Degenerate at n 2 

5.18 (b) N(0, 1). 

5.19 (b) N(0, 1). 

5.21 0.954. 

5.23 0.840. 

5.26 0.08. 

5.28 0.267. 

5.29 0.682. 

5.35 N(0, 1). 


CHAPTER 6 

6.1 (a) X, 

(b) =”/ln(m). 

(c) X. (d) The median. 

(e) The first order statistic. 

6.2 The first order statistic Y u 

1 
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64 -i II 2 
6.5 K^min (X f ); 

nnngX.X^-X^/ri]. 
6.7 (b) X/(\-X). (d) X. 

(e) X-\. 

6.9 l-e~ 2/x . 

6.10 Multiply by n/(n—\). 
6.12 (7,+ ^)/2,(^-^)/2; 

E^Y H -Y,)l2] = p(n~\)l 

(«+”. 

6.14 (77.28, 85.12). 

6.15 24 or 25. 

6.16 (3.7, 5.7). 

6.17 160. 

6.23 (5jc/6, 5jc/4). 

6.25 1692. 

6.26 3.19 to 3.61. 

6.28 3.92 to 31.50. 

6,30 (-3.6, 2.0). 

6.35 135 or 136. 

6.38 s + jln|; ^+|ln 3 . 

6.39 每； （ 31)3 8 /4 9 . 

6.42 «= 19 or 20. 

6.43 抑 = 0.062; 

^) = 0.920. 

6.44 nx73, c»42. 

6.46 (a) Reject. 

(b) /7-valuea 0.005. 

6.49 (c) /rvalue 0.005. 

6.51 23.3. 

6.52 2.91. 

6.53 ^ 3 =tt>7.81, 
reject H 0 . 

6.55 ft<8 or 32<b. 

6.56 仍 =f<ll_3, 
accept H 0 . 

6.57 6.4 <9.49, accept Ho. 
6.59 p = (X l + X 2 /2) / 

(X,+X 2 + X 3 ). 

CHAPTER 7 

7.4 

7.5 d^y). 


7.6 b = 0; does not exist. 

7.7 Does not exist. 


7.i7 fl -m 

7-19 60^( ^-^)/ 6 l 5 ; 6^/5; 

O 2 /!-, 0 2 /35. 

7.20 (1/6>W /0 ， 
0<^ 2 <>^,<co; 
y 、 a • 鄭 • 

7.22 I, Xj/n, I, X i /n-An+DYj/n 
1.2A X,X. 

7.25 YJn. 

7.27 

7.29 r,=£r /； r ( /4«; yes. 

7.37 x .' 

7.40 X 2 -\/n. 



7.51 

7.55 


(/ »+i)(r ff -r,) 

2 ’ 2(n-\) 

Y,)/n. 


CHAPTER 8 


8.2 

8.3 

8.9 

8.13 

8.15 

8.17 

8.22 

8.25 


[少 t z + 〆/«]/(— + a 2 In). 
P( y + (X )/(np+\). 


<y t . 

Q 2 jn\ 6 2 /n(n + 2). 
(a) 4/d 2 . 


(d) Var( ^ ) = ^j = i- 


2.17; 2.44. 
2 . 20 . 


CHAPTER 9 

10 

9.4 18.3; yes; yes. 


I 



Answers 


557 


11.15 

11.18 

11.25 

11.37 

11.44 


⑻ H.(b) 675/1024; 

(c) (0.8) 4 . 

8 . 

0.954; 0.92; 0.788. 

8 . 

(a) Beta (n—j+ \,j). 

(b) Beta (n—j+i—\, 

“ j-i+2). 

0.067. 

Reject H 0 . 

0; 4(4"—1)/3; no. 

2 ' 

99 - 

98; 苧 . 


CHAPTER 10 

10.9 6.39. 

10.12 r+6, 2r + 40. 

10.13 r 2 (0 + r,)/[r,(r 2 -2)]. 
10.23 7.00, 9.98. 

10.25 4.79, 22.82, 30.73. 

10.26 (a) 4.483 jc+6.483. 

10.28 卜 mine,), 

10.32 Reject H 0 . 

10.44 a/ = 0, / = 1, 2, 3, 4. 

10.45 E a/, = 0, i=l, 2,..., n. 

y* 1 1 

CHAPTER 11 


10 10 

9.6 3 乞 4 + 2 艺 x, 之 c. 

I I 

9.7 95 or 96; 76.7. 

9.9 38 or 39; 15. 

9.10 0.08; 0.875. 

9.11 (1-0) 9 (1+90). 

9.12 1, O<0<i; 1/(16^), 
i<0<l; 1 — 15/(16 #)， 

1 < 0 . 

9.14 53 or 54, 5.6. 

9.17 Reject H 0 if 3c^ 77.564. 

9.18 26 or 27; 

reject H 0 ifx< 24. 

9.19 220 or 221; 
reject H 0 if y>\7. 

9.23 / = 3 >2.262, reject H 0 . 

9.24 I r| = 2.27 >2.145, 
reject H 0 . 

9.37 c 0 (n) = (14.4) 

x (rt In 1.5 —In 9.5); 
咖 )=(14.4) 

x (n In 1.5 + ln 18). 

9.38 c 0 (/j)==(0.05n —In 8 )/ln 3.5; 
c,(n) = (0.05/j-In 4.5)/ln 3.5. 

9.41 (b) c = 0.18; 0.64, 

(c) c = 0.5; 0.16; 0.84. 

(d) c = 0.23; 0.06; 0.68. 

9.44 (9>--20jc)/30<c. 


2 4 6 9 1 
■■*** 
11111 
1 1M 




Index 


Absolute-error loss function, 311, 367 
Adaptive methods ， 536, 542 
Algebra of sets, 4 
Analysis of variance, 466 
Ancillary statistic ， 347, 353 
Andrews, D. F-, 393 

Approximate distribution, 248, 251, 381, 392, 
525 

chi-square, 295, 422 
normal for binomial, 249, 499 
normal for chi-square, 244 
normal for Poisson, 246 
Poisson for binomial, 244 
Arc sine transformation, 252, 273 
Asymptotically efficient, 379 

Basu ， D., 354 

Bayes’ formula, 23, 364 

Bayesian methods, 363, 437 

Bernoulli trials, ] 16 

Bernstein, S.，112 

Best critical region, 396, 399, 402 

Beta distribution, 180, 504 

Biased estimator, 263 

Binary statistic, 514 

Binomial distribution, 118, 244, 249, 254, 498, 
506 

Bivariate normal distribution, 146, 212, 226 、 
346, 385, 439, 478 
Boole’s inequality, 465 
Borel measurable function. 29, 156 
Box-Muller transformation, 177 
Burr distribution, 372 


Cauchy distribution, 175, 257, 387 
Censoring, 49 

Central limit theorem, 246, 511 
Change of variable, 163, 168, 186 
Characteristic function, 64 
Characterization, 202, 214 
Chebyshev’s inequality, 68, 120, 222, 240 
Chi-square distribution, 134, 144, 210, 294, 
447, 482, 489, 491 
Chi-square test ， 293, 424 
Classification, 439, 4% 

Cochran's theorem, 490 
Column effect, 467, 470 
Complement of a set, 7 
Complete sufficient statistics, 332, 335, 343, 
353, 537 

Completeness, 329, 343 
Composite hypothesis, 284, 288, 406, 413 
Compounding, 372 
Conditional distribution, 82, 148 
Conditional expectation, 84, 110 
Conditional mean, 85, 93, 123, 148, 357, 367 
Conditional probability, 83 
Conditional p.d.f., 83, 109, 148, 364 
Conditional variance ， 85, 95, 148, 357 
Confidence coefficient, 270 
Confidence interval, 268, 289, 462 
for difference of means, 276 
for means, 268, 462 
for p, 272 
for quantiles, 497 
for ratio of variances, 280 
for regression parameters, 473 
for variances, 276 
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Index 


Contingency tables, 299 
Continuous-type random variables, 37, 108, 
168, 193 
Contrasts, 464 
Control limits, 431 
Convergence, 233, 239, 240 
in distribution, 233 
in probability, 239 
with probability one, 240 
Convolution formula, 178 
Correlation coefficient, 91, 100, 105, 123, 150, 
377, 478, 534 
Covariance, 92, 386, 535 
Covariance matrix, 227, 385 
Coverage, 503 
Cramer, 243 
Critical region, 282, 284 
best, 396, 399, 402 
size, 285 

uniformly most powerful* 405 
Curtiss, J. H m 243 
CUSUMS, 432 

Decision function, 308, 433 
Degenerate distribution, 65, 235 
Degrees of freedom, 134, 298, 422 
Delta method，251 
Dependence，101 
Design matrix, 493 
Discrete-type random variable, 28 
Distribution 
Bernoulli, 116 
beta, 180, 504 

binomial, 118, 244, 249, 254, 498, 506 
bivariate normal, 146, 212, 226, 346, 385, 
439, 478 
Burr, 372 

Cauchy, 175, 257, 387 
chi-square J 34,144,210, 294, 447,482, 489, 
491 

conditional, 82, 109, 148, 364 

continuous-type, 37, 108 

of coverages, 503 

degenerate, 65, 235 

Dirichlel, 188, 371, 504 

discrete-type, 28 

double exponential, 176 

empirical，158 

exponential, 133, 203 

exponential class, 333, 343 

oiF y 182,221,421,451, 463 

function, 34, 37, 44, 78, 108, SOI 

of functions of random variables, 155 

of 161 


gamma, 131, 202 
geometric, 12! 
hypergeomctric, 34, 56, 517 
limiting, 233, 237, 243, 253, 294, 380 
of linear functions, 208 
logistic, 178 
lognormal, 154, 222 
marginal, 80, 93, 101， 109 
multinomial ， 121, 199, 295, 515, 527 
multivariate normal, 223, 294, 482 
negative binomial, 121 
of noncentral chi-square, 301, 458 
of nonccntral F, 458, 468 
of noncentral T, 420, 460 
normal, 138, 143, 147, 208, 214, 247, 381, 
446 

ofnS 2 /^ 214 

of order statistics, 193, 258 

Pareto, 267 

Poisson, 126, 166, 244 

posterior, 367, 493 

prior, 367, 493 

of quadratic fonns, 447, 481 

of R, 480 

of runs，518 

of sample, 158 

of sample mean, 214, 220, 249 
of sample variance, 214 
of r ， 181 ， 217, 218, 238, 277, 356, 415, 419, 
476, 480 

trinomial, 122, 371 
truncated, 146 
uniform, 48， 160 
Weibull ， 137, 201,372 
Distribution-free methods, 498, 506, 537 
Distribution function, 34, 37, 44, 78, 108, 501 
Distribution-function technique, SO, 1S9 
Double exponential distribution, 176 


Efficiency, 377 
Element, 3 

Empirical distribution, 158 
Equally likely events, 15 
Estimation, 259, 307, 363 
Bayesian，363 
interval, 268, 370 

maximum likelihood, 261, 324, 380, 385,389 

method of moments, 266 

minimax, 309 

minimum chi-square, 298 

point, 259, 307 

robust, 387 

unbiased ， 263, 307, 340, 381, 542 
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unbiased minimum variance, 307, 326, 332, 
338 

Estimator, 259, 307, 363 
consistent, 264, 384 
efficient, 377 

maximum likelihood, 262,324,380, 385, 389 
minimum chi-square, 298 
minimum mean-square-error, 310 
unbiased, 263, 307, 340, 381, 542 
unbiased minimum variance, 307, 326, 332, 
338 

Events, 2 
equally likely, 15 
exhaustive, 15, 22, 297 
independent, 24, 27, 103 
mutually exclusive, 15, 22, 297 
Expectation (expected value), 52, 87, 109, 205, 
218 

conditional, 84 
of a product, 104 
Exponential class, 333, 343 
Exponential distribution, 133, 203 

/'-distribution, 】 82, 221 ， 421, 451, 463 
Factorial moment, 65 
Factorization theorem, 318, 334, 341 
Family of distributions, 260, 330 
Fisher, R. A„ 372, 441 
Fisher’s information, 372, 385 
Fisher's linear discriminate function, 441 
Frequency, 2 
relative, 2 ， 12， 17 
Function, characteristic，64 
decision, 308, 433 

distribution ， 34, 37, 44, 78, 108, 501 
gamma, 131 
likelihood, 261， 416 
loss, 308, 367, 434 

moment-generating, 59, 97, 105, 111, 203, 
209, 243, 486, 510 
of parameter, 338 
point, 7 

power, 282, 285, 443 

probability density, 33, 39, 45, 50, 76, 108, 
397 

probability distribution ， 34, 37, 44 
probability set, 12, 29, 47, 75, 108 
of random variables, 155 
risk, 308, 368, 434 
set, 7 

Gamma distribution, 131， 202 
Gamma function, 131 
Geometric distribution, 121 


Geometric mean, 336 
Gini’s mean difference, 203 
Goal post loss function, 311 
Gosset, W. S., 182 

\ 

Huber, P M 390 

Hypergeometric distribution, 34, 56, 517 
Hypothesis, 280, 284, 395 

Independence, 19, 24, 111, 150, 157, 353, 480, 
537 

of linear forms, 214, 228 
mutual, 25, 111 
pairwise, 25, 112 

^ of quadratic forms, 447, 481,486 
test of, 478 

of JPand S\ 217, 231,354 
Independent events, 24, 27 
Independent random variables, 100, 157, 167, 
176 

Independent trials, 26 
Inequality, 

Boole, 465 

Chebyshev. 68, 120, 222, 240 
Rao~Blackwell, 90, 326 
Rao-Cramer, 372 
Information, Fisher’s ， 372, 385 
Interaction, 469 
Intersection of sets, 5 
Interval 

confidence, 270, 289, 462 
prediction, 149, 275 
random, 269 
tolerance, 500, 503 

Invariance property of maximum likelihood 
estimation, 265, 474 

Jacobian, 179. 186, 224 

Joint conditional distribution, 110 

Joint distribution function, 79 

Joint probability density function, 79, 111 ， 397 

Kurtosis, 66, 539 

Law of large numbers, 120, 222, 240 

Law of total probability, 23 

Least squares, 473 

Lehmann alternative, 529 

Lehmann-Scheffe, 332 
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Likelihood function, 261, 416 

Likelihood principle, 312, 324 
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Likelihood ratio tests, 409, 413, 422, 452, 467 
Limiting distribution, 233, 237, 243, 253, 294, 
380 

Limiting moment-generating function, 243 
Linear discdminant function, 441 
Linear functions, 208 
covariance, 223 
mean, 219 

moment-generating function, 209, 228 
variance, 219 
Linear model, 472 
Linear rank statistic，529 
Location-invariant statistic, 351,355,443, 542 
Logistic distribution, 178 
Lognormal distribution, 154, 222 
Loss function ， 308, 367, 434 

Mann-Whitney-Wilcoxon, 521 
Marginal probability density function ， 80, 93, 
101, 109 

Mathematical expectation, 52, 88, 109, 205, 
218 

Maximum likelihood, 261 
estimator, 261 ， 324, 380, 385, 389 
method of, 261 
Mean, 58, 158 

conditional, 85, 93, 123, 148, 357, 367 
of distribution, S8 
of linear function, 219, 530 
of a sample, 158 
of A；, 58 
of 220 
Median, 44, 57 
Median test, 516 
A/-«stimators 9 387 
Method 

of least squares, 473 
of maximum likelihood, 261 
of moments, 266 
Midrange, 200 
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Minimax, criterion, 309, 433 
decision function ， 310, 433 
Minimum chi-square estimates, 298 
Minimum mean-sq uare-crror estimates, 310 
Mode, 43 

Model, 15, 40, 78, 325, 472 
Moment-generating function, 59,97, 105, 111 ， 
203, 209, 243, 486, 510 
of binomial distribution, 118 
of bivariate normal distribution, 149 
of chi-square distribution, 134 
of gamma distribution, 133 
of multinomial distribution, 123 


of multivariate normal distribution, 226 
of noncentral chi-square distribution, 448 
of normal distribution, 139 
of Poisson distribution, 128 
of trinomial distribution, 123 
of jf, 209 
Moments, 62, 65 
factorial, 65 
method of, 266 

Monotone likelihood ratio, 409 
Most powerful test, 397, 405, 443 
Multinomial distribution, 121, 199, 295, 515, 
527 

Multiple comparisons, 461 
Multiplication rule, 21 

Multivariate normal distribution, 223, 294, 
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Mutually exclusive events, 15, 30, 297 
Mutually independent events, 25, 111 
Mutually independent random variables, 111 

Negative binomial distribution, 121 
Neyman factorization theorem, 318, 334 f 341 
Neyman-Pearson theorem, 397, 438 
Noncentral chi-square, 301, 458 
Noncentral F, 458, 468 
Noncentral parameter, 420, 459, 468, 485 
Noncentral 7\ 420, 460 
Nonparametric methods, 497, 506, 536 
Normal distribution, 138 ， 143 ， 147, 208, 214, 
247, 381,446 
Normal equations, 493 
Normal scores, 514, 533 
Null hypothesis, 288, 413 
Null set ， 5, 13 

Observations, 156 
One-sided test, 288 

Order statistics, 193, 258, 347, 498, 501, 527 
distribution, 193, 258 
functions of, 200、 503 
Outcome, I, 12 
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Pairwise independence, 25, 112 
Parameter, 118, 129, 134, 143, 420 
function of, 338 
location, 143, 350, 388 
scale, 144, 351 
shape, 144 

Parameter space ， 260, 415, 434 
Pareto distribution，267 
Partition, 15, 22, 315 
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Percentile, 44, 505 
Personal probability, 3, 367 
PERT, 203 

Point estimate, 259, 307 
Poisson distribution, 126, 166, 244 
Poisson prTCess, 127, 132, 137 
Posterior distribution, 23, 367, 493 
Power, 282, 397, 405 
function, 282, 285 
Power of a test, 285, 397, 405, 443 
Prediction interval, 149, 275 
Prior distribution, 23, 367, 493 
Probability, 2, 12, 17, 29 
conditional, 19, 23, 83 
induced, 29, 37 
measure, 2, 12, 29 
models, 15 ， 17, 47, 78 
posterior, 23 
prior, 23 
subjective, 3, 367 

Probability density function, 33, 39,45, 50, 76, 
108, 397 

conditional, 83, 109 
exponential class, 333, 343 
joint, 79, 111, 397 
marginal, 80, 93, 101, 109 
posterior, 367, 493 
prior, 367, 493 

Probability set function, 12, 29, 47, 75, 108 
p-values, 291 

Quadrant test, 540 
Quadratic forms, 446, 481 
distribution, 447, 481 
independence, 447, 481 
Quantiles, 44, 258, 497 
confidence intervals for, 497 

Random experiment, 2, 28, 259 
Random interval, 269 
Random sample, 157, 209, 211 
Random sampling distribution theory, 155 
Random variable, 28, 37, 49, 74, 87, 108, 155 
continuous-type, 37, 108, 168, 193 
discrete-type, 28, 108, 164 
mixture of types, 48, 67 
space of, 32, 39, 74, 108 
Random walk, 431 
Randomized test, 291, 412 
Range, 200 

Rank, 509, 525, 527, 529 
Rao-Blackwell theorem, 90, 326 
Rao-Cramer inequality, 372 
Real quadratic form, 446 


Redescending A/-estimator, 393 
Regression, 471, 493 
coefficients, 473, 493 
Regular case, 334, 345, 373, 410 
Relative frequency, 2, 12, 17 
Risk function, 308, 368, 434 
Robust methods ， 387, 538 
Row effect ， 467, 470 
Run, 517 
Run test, 519 

Sample, 155, 159, 211 
correlation coefficient, 478 
mean of. 158, 220, 249 
median of, 200 
random, 158, 209, 211 
size, 158, 274, 428 

space, 2, 12, 20, 28, 74, 259, 281,413 
standard deviation, 158 
variance, 158, 214 

Scale-invariant statistic, 351, 355, 443 
Schefffe, H., 462 

Sequential probability ratio test, 425 
Set, 3, 10 

A complement of, 7, 13 
of discrete points, 33 
element of, 3 
function, 7, 12 
null, 5 

probability measure, 2 t 12 
subset, 4 ， 13, 20 
Sets, 3 
algebra, 4 
intersection, 5 
union, 5, 13 
Shewhart, W., 43] 

Sigma field, 13, 29 

Significance level of test, 285, 400, 429 
Sign test, 506 
Simulation, 161, 177 
Size of critical region, 285 
Skewness, 66, 358 ^ 

Slutsky’s theorem, 254, 381 
Space, 6 

parameter, 260, 415, 434 
product, 101 

of random variables, 32, 39, 74, 108 
sample, 2, 12, 20, 28 
Spearman rank correlation, 534, 540 
Square-error loss function, 310, 367 
Standard deviation, 59, 158 
Standard normal distribution, 143, 247, 
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Statistical hypothesis, 280, 284, 395 
alternative, 281, 288, 406, 413, 527 
composite, 284, 288, 406, 413 
null, 288, 413 
one-sided, 288 
simple, 284, 397, 406, 426 
test of, 280, 284, 395 
two-sided, 288 
Statistical infererKe, 259 
Student’s / ， 182, 358 
Subjective probability, 3, 367 
Subset, 4, 13, 20 

Sufficient statistic(s), 314, 322, 332, 335, 353, 
409, 537 
joint, 341, 537 

r-distribution, 181, 217, 218, 238, 277, 356, 
415, 419, 476, 480 
Technique 

change of variable, 163, 168, 186 
distribution function, 50 
moment-generating function, 203 
Test of a statistical hypothesis, 280, 284, 395 
best, 395, 399, 402 
chi-square, 293, 424 
critical region of, 284 
of equality of distributions, 514 
of equality of means, 290, 356, 452 
of equality of variances, 293, 356 
of independence, 478 
likelihood ratio, 409, 413, 422, 452, 467 
median, 516 
nonparametric, 497 
paired f, 306 

power of, 282, 285, 397, 405, 443 
of randomness, 520 
run, 517 

sequential probability r 找 tio, 425 
sign, 506 

significance level, 285, 4(K), 429 
uniformly most powerful, 405 


for variances, 293 
Tolerance interval, 500, 503 
Total probability, 23 
Training sample, 441 
Transformation, 163, 168, 177 
of continuous-type variables, 168 
of discrete-type variables, 163 
not one-to-one, 188 
one-to-one, 163, 168, 186 
Trinomial distribution, 122, 37 】 

Truncated distribution,】46 
Types I and II errors, 286, 429 

Unbiased, 17 

Unbiased estimator, 263, 307, 340, 381, 542 
Unbiased minimum variance estimator, 307, 
326, 332, 338 

Uniform distribution, 48, 160 
Uniformly most powerful test, 405 
Union of sets, 5 
Uniqueness 
of estimator, 329, 335 
of characteristic function, 64 
of moment-generating function, 59 

Variance, 58, 89, 276 
analysis of, 466 
conditional, 85, 95, 148, 357 
of a distribution, 58 
of a linear function, 219, 530 
of a sample, 158 
of X y 58 
of f, 220 

Variance-covariance matrix, 227, 385 
Venn diagram，6 

Wald, 427 

Weibull distribution, 137, 201, 372 
Weight function, 388 
Wilcoxon, 508, 521， 539 
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Confidence Intervals for Means: Normal Assumptions 

x ± aafy/n, where <I>(a) = 1 — a/2, for fi with a 2 known 

x 土 bslyjn — 1, where ?t(T <, b) = \ — a/2, for ^ with a 2 
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Approximate Confidence Intervals for Binomial Parameters 
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One-Sided Tests of Hypotheses: Normal Assumptions 
H 0 • fi = 叫 against H^. n> fi 0 . Reject H 0 if 

X 丨 : ^ c, where Pr (7 1 > c) = a 
5/vV-l 

H 0 : - fi 2 against // L : /i, > fi 2 - Reject if 

u 


jn'S] + n 2 ^ 
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> d, where ?t(T> d) 
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One-Sided Test of Hypothesis About p 

Ho • p = Po against H t : p > p 0 . Reject if 

> k, where <J>{k) = 1 — a 

V/^o(l - Po)/n 
Chi-Square Test 

Reject null hypothesis concerning probabilities if 
^ (Obsi - Exp ：) 2 

£ --- > h, where h is the 100(1 — a) 

、 all cells Expi 

percentile of x 2 ( r h where r is the difference of the 
dimension of the total parameter space and that of 
the parameter space under the null hypothesis 



Other Important Concepts 

Sufficient Statistics 

The statistic wd ， X 2 ,..., X„) is sufficient for 0 if 
and only if the joint p.d.f. of A"i, , X„ equals 

fc,[u(x,,x 2 , • •., x"); 0\k 2 (x u x 2 ,.. .,x„) 
where the function k 2 does not depend upon 6. 

Let y, and Y 2 be two statistics such that Y x is sufficient for 
6 and Y 2 is an unbiased estimator of 6. Then £(^ 2 |^i) = (p( Y i) 
is unbiased and var [<^(7,)] < var (y 2 )- 

If the random sample arises from a distribution with p.d.f. 
f(x; 6) = exp [/ >(0) 尺 (x) + S(x) + q(6)], a < x < b, 

n 

then Y, ATOQ is a complete sufficient statistic for 9. 

i= 1 

If Yy and Y 2 are statistics such that Y t is complete and sufficient 
for 0 and Y 2 has a distribution that does not depend upon 
6, then Y y and Y 2 are independent. 

Maximum Likelihood Estimators and Related Tests 

The maximum likelihood estimator 0， which maximizes L{&), 
the joint p.d.f. of the random variables, is a function of the 
sufficient statistic if it exists. Under certain conditions, & has 
an approximate normal distribution with mean 0 and variance 
1 / [n/(0)], where the Fisher information 1(6) equals 

and thus is asymptotically efficient. 

The region defined by < k provides a best critical 

region for testing H 0 :6 = O' against H ] : 8 = Q". 

A likelihood ratio test is defined by A = L(w)/L(Cl) < A 0 , 
where L(<b) and L(^2) represent, respectively, the maxima of the 
likelihood function in the parameter space w and Q, co <= ft. 
Let 0, 0i,..., i, 0* be quadratic forms in normally 
distributed random variables such that Q is x 2 ( r ), 

Qi is xV,)，i = 1, 2,... - 1, 0* > 0, and 

Q = Q\ + ■ + Qk-i + Qk- Then 0,.U* are 

independent and Q k is x\ r — r, — • ■ • — r*_ ,) 
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d ln/(JT; d) 
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